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In most sciences one generation tears down 
what another has built and what one has 
established another undoes. In Mathematics 
alone each generation builds a new story to 
the old structure. 


—Hermann Hankel 


A peculiar beauty reigns in the realm of 
mathematics, a beauty which resembles not 
so much the beauty of art as the beauty of 
nature and which affects the reflective mind, 
which has acquired an appreciation of it, 
very much like the latter. 


—Ernst Kummer 
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Preface 


It is often averred that two contrasting cultures coexist in mathematics—the theory- 
building culture and the problem-solving culture. The present volume was certainly 
spawned by the latter. This book takes an array of specific problems and solves 
them, with the needed tools developed along the way in the context of the particular 
problems. 

The book is an unusual hybrid. It treats a mélange of topics from combinatorial 
probability theory, multiplicative number theory, random graph theory, and combi- 
natorics. Objectively, what the problems in this book have in common is that they 
involve the asymptotic analysis of a discrete construct, as some natural parameter 
of the system tends to infinity. Subjectively, what these problems have in common 
is that both their statements and their solutions resonate aesthetically with me. 

The results in this book lend themselves to the title “Problems from the Finite to 
the Infinite”; however, with regard to the methods of proof, the chosen appellation 
is the more apt. In particular, generating functions in their various guises are 
a fundamental bridge “from the discrete to the continuous,” as the book’s title 
would have it; such functions work their magic often in these pages. Besides 
bridging discrete mathematics and mathematical analysis, the book makes a modest 
attempt at bridging disciplines—probability, number theory, graph theory, and 
combinatorics. 

In addition to the considerations mentioned above, the problems were selected 
with an eye toward accessibility to a wide audience, including advanced undergrad- 
uate students. The technical prerequisites for the book are a good grounding in basic 
undergraduate analysis, a touch of familiarity with combinatorics, and a little basic 
probability theory. One appendix provides the necessary probabilistic background, 
and another appendix provides a warm-up for dealing with generating functions. 
That said, a moderate dose of the elusive quality known as mathematical maturity 
will certainly be helpful throughout the text and will be necessary on occasion. 

The primary intent of the book is to introduce a number of beautiful problems in 
a variety of subjects quickly, pithily, and completely rigorously to graduate students 
and advanced undergraduates. The book could be used for a seminar/capstone 
course in which students present the lectures. It is hoped that the book might also be 


ix 


x Preface 


of interest to mathematicians whose fields of expertise are away from the subjects 
treated herein. In light of the primary intended audience, the level of detail in proofs 
is a bit greater than what one sometimes finds in graduate mathematics texts. 

I conclude with some brief comments on the novelty or lack thereof in the 
various chapters. A bit more information in this vein may be found in the chapter 
notes at the end of each chapter. Chapter 1 follows a standard approach to the 
problem it solves. The same is true for Chap. 2 except for the probabilistic proof 
of Theorem 2.1, which I haven’t seen in the literature. The packing problem 
result in Chap. 3 seems to be new, and the proof almost certainly is. My approach 
to the arcsine laws in Chap.4 is somewhat different than the standard one; it 
exploits generating functions to the hilt and is almost completely combinatorial. 
The traditional method of proof is considerably more probabilistic. The proofs of 
the results in Chap.5 on the distribution of cycles in random permutations are 
almost exclusively combinatorial, through the method of generating functions. In 
particular, the proof of Theorem 5.2 makes quite sophisticated use of this technique. 
In the setting of weighted permutations, it seems that the method of proof offered 
here cannot be found elsewhere. The number theoretic topics in Chaps. 6-8 are 
developed in a standard fashion, although the route has been streamlined a bit to 
provide a rapid approach to the primary goal, namely, the proof of the Hardy— 
Ramanujan theorem. In Chap. 9, the proof concerning the number of cliques in a 
random graph is more or less standard. The result on tampering detection constitutes 
material with a new twist and the methods are rather probabilistic; a little additional 
probabilistic background and sophistication on the part of the reader would be useful 
here. The results from Ramsey theory are presented in a standard way. Chapter 10, 
which deals with the phase transition concerning the giant component in a sparse 
random graph, is the most demanding technically. The reader with a modicum of 
probabilistic sophistication will be at quite an advantage here. It appears to me that 
a complete proof of the main results in this chapter, with all the details, is not to be 
found in the literature. 


Acknowledgements It is a pleasure to thank my editor, Donna Chernyk, for her professionalism 
and superb diligence. 


Haifa, Israel Ross G. Pinsky 
April 2014 


Contents 


_ 


~ontn wm 


10 


Partitions with Restricted Summands 


or “the Money Changing Problem” .......................eceeeeeeeeee ees 1 
The Asymptotic Density of Relatively Prime Pairs and 

of Square-Free Numbers................. 0... :e cece cence ee eneeeeeaeeeenaees 7 
A One-Dimensional Probabilistic Packing Problem .................... 21 
The Arcsine Laws for the One-Dimensional Simple 

Symmetric Random Walk .....................:ccccceeee eee eeeeeee eee eeees 35 
The Distribution of Cycles in Random Permutations .................. 49 
Chebyshev’s Theorem on the Asymptotic Density of the Primes...... 67 
Mertens’ Theorems on the Asymptotic Behavior of the Primes....... 75 


The Hardy—Ramanujan Theorem on the Number 
of Distinct Prime Divisors ............... 2.0.0... c cc ccceece cece eeee teen teens 81 


The Largest Clique in a Random Graph and Applications 


to Tampering Detection and Ramsey Theory..........................5. 89 
9.1. Graphs and Random Graphs: Basic Definitions .................... 89 
9.2 The Size of the Largest Clique in a Random Graph ................ 91 
9.3 Detecting Tampering in a Random Graph .......................0.5 99 
Ga Mamisey TNO «cc ccnssdeesapivesesarecse sata cctdiaeeecveterewene das 104 
The Phase Transition Concerning the Giant Component 
in a Sparse Random Graph: A Theorem of Erdés and Rényi......... 109 
10.1 Introduction and Statement of Results ................. eee cess ee eee 109 
10.2 Construction of the Setup for the Proofs 

Or Tineorens VO BH. WOE ccsics caver pinata cataee tase ease dane bans ee 111 
10.3. Some Basic Large Deviations Estimates ........................0085 113 


10:4. Proof of Theorem VQ co.cc. cccceeccccccesccavsoncesstneavasevncenes 115 


xi 


xii Contents 


10.5. The Galton—Watson Branching Process ..................000eeee eee 116 

16 Proor or Theorem We occas kn wees ete cennidiveseesctedeuaas 120 
Appendix A A Quick Primer on Discrete Probability..................... 133 
Appendix B_ Power Series and Generating Functions ..................... 141 
Appendix CA Proof of Stirling’s Formula ..........................0.0000 145 
Appendix D An Elementary Proof of )-°~, 4, = - m seissoacase ote nat etienate 149 
Relerenees .<.. .csceogcnie ieee yitiiednnndioowandineeeelnh ioe want tansanoednbianenss 151 


A Note on Notation 


Z denotes the set of integers 

Z* denotes the set of nonnegative integers 

N denotes the set of natural numbers: {1,2,---} 
R denotes the set of real numbers 


f(x) = O(g(x)) as x — a means that limsup,_,, |2| < oo; in particular, 
f(x) = O(1) as x > a means that f(x) aie one asx > a 
f(x) = 0(g(x)) as x > a means that lim,, 42 = 0; in particular, f(x) = 0(1) 


as x > a means lim,_,, f(x) = 0 


aS 


f ~ gas x — a means that lim, £ na) =1 
gcd(x1,...,Xm) denotes the greatest common divisor of the positive integers 
X1,++-5Xm 


The symbol [-] is used in two contexts: 

1. [n] = {1,2,...,n}, forn EN 

2. [x] is the greatest integer function; that is, [x] =n,ifn ¢ Zandn<x<n+1 
Bin(n, p) is the binomial distribution with parameters n and p 

Pois(A) is the Poisson distribution with parameter A 

Ber(p) is the Bernoulli distribution with parameter p 


X ~ Bin(n, p) means the random variable X is distributed according to the 
distribution Bin(n, p) 


xiii 


Chapter 1 
Partitions with Restricted Summands 
or “the Money Changing Problem”’’ 


Imagine a country with coins of denominations 5 cents, 13 cents, and 27 cents. How 
many ways can you make change for $51,419.48? That is, how many solutions 
(b,, 52, b3) are there to the equation 5b; + 13d) + 27b3 = 5,141,948, with 
the restriction that b,,b2,b3 be nonnegative integers? This is a specific case of 


the following general problem. Fix m distinct, positive integers {a; =,- Count the 


number of solutions (b;,..., bm) with integral entries to the equation 
bya, + bpag + +++ + bndm =n, bj) > 0, j =1,...,m. (1.1) 
A partition of n is a sequence of integers (x1,...,X;), where k is a positive 


integer, such that 


k 
y Xj =n and x1 > xX >- SX, > 1. 


i=1 


Let P,, denote the number of different partitions of n. The problem of obtaining an 
asymptotic formula for P,, is celebrated and very difficult. It was solved in 1918 by 
G.H. Hardy and S. Ramanujan, who proved that 


1 
~ AnJ3 


Now consider partitions of n where we restrict the values of the summands x; above 
to the set {a; “=| Denote the number of such restricted partitions by P, ({a; Vix): 
A moment’s thought reveals that the number of solutions to (1.1) is P, ({a; Pei) 
Does there exist a solution to (1.1) for every sufficiently large integer n? And 
if so, can one evaluate asymptotically the number of such solutions for large n? 
Without posing any restrictions on {a; Wien the answer to the first question is 
negative. For example, ifm = 3 anda; = 5,a) = 10,a3 = 30, then clearly 
there is no solution to (1.1) if + 5. Indeed, it is clear that a necessary condition 


[in 
P,, e"V 3, asn > OO. 
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for the existence of a solution for all large n is that {a; Wet are relatively prime: 
gcd(d,...,@m) = 1. This is the time to recall a well-known result concerning 
solutions (b;,...,5m) with (not necessarily nonnegative) integral entries to the 
equation ba, + boa) +---+ bam =n. A fundamental theorem in algebra/number 
theory states that there exists an integral solution to this equation for all n € Z if 


and only if gcd(aj,...,@ ») = 1. This result has an elegant group theoretical proof. 
We will prove that for all large n, (1.1) has a solution (b,,...,),,) with integral 
entries if and only if gcd(a,,...,@) = 1, and we will give a precise asymptotic 


estimate for the number of such solutions for large n. 


Theorem 1.1. Let m > 2 and let {a; = be distinct, positive integers. Assume 
that the greatest common divisor of {a ; Via is 1: gcd(a,,...,4m) = 1. Then for all 
sufficiently large n, there exists at least one integral solution to (1.1). Furthermore, 
the number P,,({a; }'_,) of such solutions satisfies 


nl 


= Dt Tas aj 


Remark. In particular, we see (not surprisingly) that for fixed m and sufficiently 
large n, the smaller the {a; Ve ,; are, the more solutions there are. We also see 
ate 


Prtaj fay) ~ ,asn—> oo. (1.2) 


that given m, and {a;’}"'!, and given m, and fae j=v With m, > my, then 
for sufficiently large there will be more solutions for the latter set of parameters. 


Proof. We will prove the asymptotic estimate in (1.2), from which the first statement 
of the theorem will also follow. Let ,, denote the number of solutions to (1.1). (For 
the proof, the notation h,, will be a lot more convenient than P,, ({a; yi=1)-) Thus, 
we need to show that (1.2) holds with h, in place of P,,({a; fat) We define the 
generating function of {h,,}°2.,: 


HG)= ¥" hax”. (1.3) 


n=1 


A simple, rough estimate shows that h, < Tr, u.,, from which it follows that the 
aj? 


power series on the right hand side of (1.3) converges for |x| < 1. See Exercise 1.1. 
It turns out that we can exhibit H explicitly. We demonstrate this for the case m = 2, 
from which the general case will become clear. 

For k = 1,2, we have 


1 


fe a 


and the series converges absolutely for |x| < 1. Thus, 
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——— er a\ 2a, ./ 7 a2 2a2 3a24....)— 
(aaa ee ere) 


(1+ x” 4 x24 + x34 fee) $ (x? $M FO 4 y2MIFO 4 yPartar 4.) 4 


(20242 - x t242 4 x2art2ar 4 Barta +...) 4 o. (1.4) 


A little thought now reveals that on the right hand side of (1.4), the number of 

times the term x" appears is the number of integral solutions (b,, bz) to (1.1) with 

m = 2; that is, h, is the coefficient of x” on the right hand side of (1.4). So H(x) = 

=e Clearly, the same argument works for all m; thus we conclude that 
1 

(l= x%1)(1 — x@)++- (1 — x4m)’ 


H(x) = Ix] <1. (1.5) 


We now begin an analysis of H, as given in its closed form in (1.5), which will 
lead us to the asymptotic behavior as n — oo of the coefficients h, in its power 
series representation in (1.3). Consider the polynomial 


p(x) = (1 — x") —x%)--- 1 — x). 


1) a . 
For each k, the roots of 1 —x** are the a, th roots of unity: {e ye a Clearly lisa 


root of p(x) of multiplicity m. Because of the assumption that gcd(a,,...,4m) = 1, 
it follows that every other root of p(x) is of multiplicity less than m—that is, there is 
2rijp 

no complex number r that can be written in the form r = e “ak , simultaneously for 
k = 1,...,m, where 1 < jx < ax. Indeed, if r can be written in the above form 
for all k, then it follows that a is independent of k. In particular, a, = =a , for 
k = 2,...,m. Since 1 < j; < aj, it follows that there is at least one prime factor 
of a, which is a factor of all of the ax, kK = 2,...,m, and this contradicts the 
assumption that gcd(a),...,4m) = 1. 

Denote the distinct roots of p(x) by 1,72,...,77, and note from above that 
|r;| = 1, for all 7. Let mz denote the multiplicity of the root r;, fork = 2,...,1. 
Also, note that p(0) = 1. Then we can write 


(=x )0 39-02) =C == iS) 
i) r| 
where 1 < m; <m, for j =2,...,/. 


In light of (1.5) and (1.6), we can write the generating function H(x) in the form 


1 
BONS 1.7 
(x) Oe a) i er a (1.7) 
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By the method of partial fractions, we can rewrite H from (1.7) in the form 


= Ai Aj? Aim 
H@= (Ga a see )+ 


xen} él _ x) 
Ay, Aram Ay Alm 
ta Pook : )te¢ (SAR tet = ). (1.8) 
(Ga 7 fle) ey i (l=) 
For positive integers k, the function F(x) = (1 — x)~* has the power series 
expansion 


we ‘ @) 
To prove this, just verify that z 7 o 


hand side of (1.8) can be expanded as 
Ai S(ntm-1 
——— =A an 1.9 
ad = xyn ul 2 m—1 x ( ) 


The coefficient of x” on the right hand side above is 


= ‘gun me Thus, the first term on the right 


(n+m—1)(n+m—2)---(n4+1) nn! 
~ Ajj-——— asn > oo. 


(m — 1)! (m — 1)! 


Au 


Every other term on the right hand side of (1.8) is of the form a 3 where | < 


k < mand |r| = ra By the same isons as above, the coefficient of x” in the 
expansion for 


az “5k is asymptotic to Hon as n — oo (substitute > for x in the 
appropriate series expansion). Thus, each of these terms is on a smaller order than 
the coefficient of x” in (1.9). We thereby conclude that the coefficient of x” in H(x) 


is asymptotic to Aj, 


at asn — oo. By (1.3), this gives 


h, ~ Aj; -——————.,, asn > ov. (1.10) 
m-— 1)! 


It remains to evaluate the constant A,,. From (1.8), it follows that 


Au 


BO) ~ Gaye 


ney : ), asx > 1. 


1 — xyn-l 
Thus, 


lim (1 — x)" H(x) = Au. (1.11) 
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But on the other hand, from (1.5), we have 


m — (l=%)" _ = x-1 
(l—x)"H(x) = Ee ee Ila (112) 


Since (x“)'|x=1 = ee a ad = aj, we conclude from (1.12) that 


1 
lim (1 — x)" A(x) = —— (1.13) 
IT}. j=l aj 
From (1.11) and (1.13) we obtain A,,; = aa , and thus from (1.10) we conclude 
nm—l 
that hy ~ (DT a; Oo 


Exercise 1.1. If bja; + boa2 + +++ + bmam = n, then of course bja; <n, for 
all 7 € [m]. Use this to obtain the following rough upper bound on the number of 


solutions h, to (1.1): h, < a, . Then use this estimate together with the third 


“fundamental result” in Appendix B to show that the series defining H(x) in (1.3) 
converges for |x| < 1. 


Exercise 1.2. Go through the proof of Theorem 1.1 and convince yourself that the 
result of the theorem holds even if the integers {a ee are not distinct. That is, 
the number of solutions to (1.1) is asymptotic to the expression on the right hand side 
of (1.2). Note though that the number of such solutions is not equal to P, ({a ; Vai): 
What is the leading asymptotic term as n — oo for the number of ways to make n 
cents out of quarters and pennies, where one distinguishes the quarters by their mint 


marks—“p” for Philadelphia, “d” for Denver, and “s” for San Francisco—but where 
the pennies are not distinguished? 


Exercise 1.3. In the case that d = gcd(a),...,@m) > 1, use Theorem 1.1 to 
formulate and prove a corresponding result. 


Exercise 1.4. A composition of n is an ordered sequence of positive integers 
(x1,...,X~), where k is a positive integer, such that ye x; = n. A favorite 
method of combinatorists to calculate the size of some combinatorial object is to 
find a bijection between the object in question and some other object whose size 
is known. Let C,, denote the number of compositions of n. To calculate C,, we 
construct a bijection as follows. Consider a sequence of n dots in a row. Between 
each pair of adjacent dots, choose to place or choose not to place a vertical line. 
Consider the set of all possible dot and line combinations. (For example, if n = 5, 
here are two possible such combinations: (1)--|-|-- (2)-+-+-- ): 


(a) Show that there are 2”! dot and line combinations. 

(b) Show that there is a bijection between the set of compositions of 7 and the set 
of dot and line combinations. 

(c) Conclude from (a) and (b) that C, = 2”7!. 
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Exercise 1.5. Let ci denote the number of compositions of 7 with summands 
restricted to the integers | and 2, that is, compositions (x,--- ,x,) of m with the 
restriction that x; € {1,2}, for all 7. The series 


CO 


1 
C3 aa So(e +37)" (1.14) 


n=0 


V5—1 


converges absolutely for |x| < “3= since |x + x?| < |x| + lagi? ee 1 FE heli 


Y5-1, 
— 


(a) Similar to the argument leading from (1.3) to (1.5), argue that ci is the 

coefficient of x” in the power series expansion of F’. 

Show that F(x) = °°, fnx”, where { f,}°2, is the Fibonacci sequence— 

see (B.2) in Appendix B. (Hint: One has (x + x?) F(x) = F(x) —1.) 

(c) Conclude from (a) and (b) that Cit is the nth Fibonacci number; thus, 
from (B.10) in Appendix B, 


(b 


wm 


1 (rz 15), 


Ch} ee 
n V5 ( 
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Chapter Notes 


For a leisurely and folksy introduction to the use of generating functions in 
combinatorics, see Wilf’s little book [34]. For a recent encyclopedic treatment, see 
the book of Flajolet and Sedgewick [20]. The asymptotic formula for P,,, noted at 
the beginning of the chapter, was proved by Hardy and Ramanujan in [23]. For a 
modern account, see [4]. The asymptotic estimate in Theorem 1.1 is due to I. Schur. 
As noted in the text, this asymptotic formula also proves that (1.1) has a solution 
for all sufficiently large n. However, this latter fact can be proved more easily; see, 
for example, Brauer [11]. Given {a; ca what is the exact minimal value of 10 
such that every 1 > ng can be written in the form (1.1)? When m = 2, the answer is 
(a, — 1)(a2 — 1). A proof can be found in [34]. For m > 3 the answer is not known. 


Chapter 2 
The Asymptotic Density of Relatively Prime 
Pairs and of Square-Free Numbers 


Pick a positive integer at random. What is the probability of it being even? As 
stated, this question is not well posed, because there is no uniform probability 
measure on the set N of positive integers. However, what one can do is fix a 
positive integer n, and choose a number uniformly at random from the finite set 
[n] = {1,...,7}. Letting p, denote the probability that the chosen number was 
even, we have limy—o0 Pn = s, and we say that the asymptotic density of even 
numbers is equal to s. 

In this spirit, we ask: if one selects two positive integers at random, what is the 
probability that they are relatively prime? Fixing n, we choose two positive integers 
uniformly at random from [n]. Of course, there are two natural ways to interpret this. 
Do we choose a number uniformly at random from [n] and then choose a second 
number uniformly at random from the remaining 1 — 1 integers, or, alternatively, 
do we select the second number again from [n], thereby allowing for doubles? The 
answer is that it doesn’t matter, because under the second alternative the probability 
of getting doubles is only 1. and this doesn’t affect the asymptotic probability. Here 
is the theorem we will prove. 


Theorem 2.1. Choose two integers uniformly at random from |n]. Asn — ov, the 
asymptotic probability that they are relatively prime is = ~ 0.6079. 


We will give two very different proofs of Theorem 2.1: one completely number 
theoretic and one completely probabilistic. The number theoretic proof is elegant 
even a little magical. However, it does require the preparation of some basic number 
theoretic tools, and it provides little intuition. The number theoretic proof gives the 
asymptotic probability as ()~¢° , 4)~!. The well-known fact that °°, 4 = x is 
proved in Appendix D. The probabilistic proof requires very little preparation; it is 
enough to know just the most rudimentary notions from discrete probability theory: 
probability space, event, and independence. A heuristic, non-rigorous version of 
the probabilistic proof provides a lot of intuition, some of which the reader might 
find obscured in the rigorous proof. The probabilistic proof gives the asymptotic 
probability as [[72.,(1 — ra where {px}? is an enumeration of the primes. One 
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then must use the Euler product formula to show that this is equal to ()°7-, oe 
We will first give the number theoretic proof and then give the heuristic and the 
rigorous probabilistic proofs. 

The number theoretic ideas we develop along the way to our first proof of 
Theorem 2.1 will bring us close to proving another result, which we now describe. 
Every positive integer n > 2 can be factored uniquely asn = pt bees pk, where 
m > 1, {p;}7-, are distinct primes, and k; € N, for j € [m]. If in this factorization, 
one has k; = 1, for all j € [mm], then we say that n is square-free. Thus, an integer 
n > 2 is square-free if and only if it is of the form = p,---: Pm, where m > 1 and 
{Pj yr are distinct primes. The integer | is also called square-free. There are 61 
square-free positive integers that are no greater than 100: 
1,2,3,5,6,7,10,11,13,14,15,17,19,21,22,23,26,29,30,31,33,34,35,37,38,39,41,42,43, 
46,47,51,53,55,57,58,59,61,62,65,66,67,69,70,71,73, 74,77, 78,79, 82,83, 85,86, 
87,89,91,93,94,95,97. 

Let C, = {k : 1 < k <n, k is square-free}. If lim,_,.5 Gal exists, we call 
this limit the asymptotic density of square-free numbers. After giving the number 
theoretic proof of Theorem 2.1, we will prove the following theorem. 


Theorem 2.2. The asymptotic density of square-free integers is s = 0.6079. 


For the number theoretic proof of Theorem 2.1, the first alternative suggested 
above in the second paragraph of this chapter will be more convenient. In fact, once 
we have chosen the two distinct integers, it will be convenient to order them by size; 
thus, we may consider the set B,, of all possible (and equally likely) outcomes to be 


By, ={QG.k): 1 <j <k <n}. 
Let A, C B, denote those pairs which are relatively prime: 
An ={G, Ki 1< 7 <k <n, ged(j,k) = 1. 
Then the probability g,, that the two selected integers are relatively prime is 


_ [Ant _21Anl 
[By] nr — 1) 


(2.1) 


n 


We proceed to develop a circle of ideas that will facilitate the calculation of 
lim,—+co Yn and thus give a proof of Theorem 2.1. A function a : N > R is called 
an arithmetic function. The Mébius function 1 is the arithmetic function defined by 


l,ifn=1; 
w(n) = 4 (-1)”, ifn = []7_, pj, where {p;}"_, are distinct primes; 


0, otherwise. 


Thus, for example, we have (3) = —1, w(15) = 1, and (12) = 0. 


2 Relatively Prime Pairs and Square-Free Numbers 9 


Given arithmetic functions a and b, we define their convolution a * b to be the 
arithmetic function satisfying 


(a * b)(n) = dado), neN. 


d\n 


Clearly, a * b = b * a. The convolution arises naturally in the following context. 
Define formally 


i@=>. a (2.2) 
n=1 
and 
= 
g(x) =>" (2.3) 


n=1 


When we say “formally,” what we mean is that we ignore questions of convergence 
and manipulate these infinite series according to the laws of addition, subtraction, 
multiplication, and division, which are valid for series with a finite number of terms 
and for absolutely convergent infinite series. Their formal product is given by 


a (7 b(k . a(d)b(k 
fooe@=(Y yy 2B)— yr LOEW _ ve S> ald)b(k) 
d,k=1 


x 
(2 k=1 (dk) =1 d,kidk=n 


=-)> ~ datas y= 5 a ai (2.4) 


n=1 d\n n=1 


If the series on the right hand side of (2.2) and (2.3) are in fact absolutely convergent, 
then the series on the right hand side of (2.4) is also absolutely convergent. In 
such case, the equality (°°, “@)(°2, 2) = re, 2 is a rigorous 
statement in mathematical analysis. 

An arithmetic function a is called multiplicative if a(nm) = a(n)a(m) whenever 
gcd(n,m) = 1. It follows that if a € 0 is multiplicative, then a(1) = 1. Ifa ¥ Ois 


multiplicative, then it is completely determined by its values on the prime powers; 


indeed, ifn = j= 7) My is the factorization of 1 into a product of distinct prime 


m kj m kj 
powers, then a(n) = a({]j=1 Pj )= = a(p;’). 
It is trivial to verify that jz is multiplicative. For the first proposition below, the 
following lemma will be useful. 


Lemma 2.1. The arithmetic function pare [L(d) is multiplicative. 


Proof. Let n and m be positive integers satisfying gcd(n, m) = 1. We have 
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Youd) Youd) = YD wd@dulhy)= YS wdidr) = Yo w@), 


d\\n dy\|m d\|n,dz|m d,|n,dz|m d|nm 


where the second equality follows from the fact that jz is multiplicative and the fact 
that if ged(n, m) = 1, d\n and d|m, then gcd(d,, d)) = 1, while the final equality 
follows from the fact that if gcd(n,m) = 1 and d|nm, then d can be written as 
d = dd for a unique pair d, d satisfying d\|n and d2|m. (The reader should 
verify these facts.) Oo 


We introduce three more arithmetic functions that will be used in the sequel: 


1, ifn =1; 


l(n) = 1, foralln; i(m) =n, foralln; e(n) = 
0, otherwise. 


Note that a * e = a, for all a, and that (a * 1)(n) = Dali a(d). A key result we 
need is the Mébius inversion formula. 


Proposition 2.1. Leta be an arithmetic function. Define b = ax1. Thena = bx. 


Remark. Written out explicitly, the theorem asserts that if 


b(n) := oad), 


d\n 


then a(n) = Dosis b(d)u(5). 


Proof. To prove the proposition, it suffices to prove that 
l*w~=e. (2.5) 


Indeed, using this along with the easily verified associativity of the convolution, we 
have 


bx w=(axl)*u=ax(1l* pp) =axe=a. 
We now prove (2.5). We have 


(1 * w)(n) = (w* Dm) = DF ud). 


d\n 


By Lemma 2.1, the function >> die iA(d) is multiplicative. Clearly, the function e 
is multiplicative. Obviously, e(1) = 1 and e(p*) = 0, for any prime p and any 
positive integer k. We have Yan u(d) = pwd) = 1. Thus, since a nonzero, 
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multiplicative, arithmetic function is completely determined by its values on prime 
powers, to complete the proof that 1 * 4 = e, it suffices to show that }> d\pk (d) 


= 0. We have Dyipk Hd) = Diao M(P/) = HO) + e(p)=1-1=0. 0 


We introduce one final arithmetic function—the well-known Euler $-function: 


é(m) = {7 :1< j <n, ged(j,n) = 1}3I. 


That is, (”) counts the number of positive integers less than or equal to n which 
are relatively prime to n. For our calculation of lim, 90 gd, we Will use a result that 
is a corollary of the following proposition. 


Proposition 2.2. ¢ *« 1 = 7; that is, 


Y6(d) =n. 


d\n 


From Proposition 2.2 and the Mébius inversion formula, the following corollary 
is immediate. 


Corollary 2.1. ju «i = ¢; that is, 


p(n) = Yo n(d)-. 


d\n 


Remark. For the proofs of Theorems 2.1 and 2.2, we do not need Proposition 2.2, 
but only Corollary 2.1. In Exercise 2.1, the reader is guided through a direct proof of 
the corollary. The proof also will reveal why the seemingly strange M6bius function 
has such nice properties. 


Proof of Proposition 2.2. Let d|n. It is easy to see that ¢(d) is equal to the number 
of k € [n] satisfying gcd(k,n) = 4. Indeed, k € [n] satisfies ged(k,n) = 7 if and 
only if k = j(7), for some j ¢€ [d] satisfying gcd(d, j) = 1. (The reader should 
verify this.) Also, clearly, every k € [n] satisfies gcd(k,n) = 7 for some d|n. The 
proposition follows from these facts. Oo 


Remark. For an alternative proof of Proposition 2.2, exactly in the spirit of 
Lemma 2.1 and the proof of (2.5), see Exercise 2.2. 


We are now in a position to prove Theorem 2.1. 


Number Theoretic Proof of Theorem 2.1. For each k > 2, there are #(k) integers j 
satisfying 1 < 7 <k and gcd(j,k) = 1. Thus, 


|Anl = {U.K 1 <j <k <n, ged(j,.k) = Yl = >> oh). 


k=2 
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Therefore, from (2.1), we have 


_ 2D ob) 
is n(n—1) — 
To calculate 
lim g, = lim 2 Deke 2 da OR) (2.6) 
noo n>oo n(n — 1) 


we analyze the behavior of the sum }°7_, ¢(k) for large n. 


Remark. The function @ can be written explicitly as 
¢(n) =n] Ja- >), n>2, (2.7) 
pin 


where |] pin indicates that the product is over all primes that divide n; see 
Exercise 2.3. However, this formula is of no help whatsoever for analyzing the above 
sum. 


We will use Corollary 2.1 to analyze )°;_, @(k). From Corollary 2.1, we have 


is a n ‘ 
Yo) = Du +i0 = u@= = 
k=1 k=1 


k=1 d|k 
dy Una) = Ya ae 
k=1 dd’=k d/<" 


Since 7, j = +m(m + 1), we have 


Yolk) = Yad) a= SAO +D. 28) 
k=] et 


d’<4 n 


We have [7191+ 0 < 7+) = GY) + 7, and [FIG] +) = G- D5 = 
(Gy - 7; thus, 


n 


os = sFiGl+0sG ee 


Substituting this two-sided inequality in (2.8), we obtain 


ae LS sew <5 ; Deere ae (2.9) 
=1 
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Now 


boa <5 a, 1+ logn, (2.10) 
1 =1 


since the final sum is a lower Riemann sum for Sf; . 1 dx. From (2.9) and (2.10), we 
obtain 
Yi ok) — 1 ud) 
lim ='=2 =— a Gall) 


n>oo n(n —1) 2 


It remains to evaluate )°7—, a (On the face of it, from the definition of LL, 


it would seem very difficult to evatuats this explicitly. However, Mobius inversion 
saves the day. Consider (2.2)-(2.4) with a = 1 and b = yw and with x = 2. With 
these choices, the right hand sides of (2.2) and (2.3) are absolutely convergent. 
By (2.5), we have 1 * w = e; that is, a x b = e. Therefore, we conclude from 


(2.2)-(2.4) that 
= 1 =, ld 
( 7) (> Ho) =F (2.12) 


ene (2.13) 


We give a completely elementary proof of this fact in Appendix D. From (2.12) 
and (2.13) we obtain 


“y(d) 6 
d = (2.14) 


Using (2.14) with (2.11) and (2.6) gives 


; 6 
lim gn = =, 
noo A 
completing the proof of the theorem. Oo 


Remark. If a is an arithmetic function and f is a nondecreasing function, we 
say that the function f is the average order of the arithmetic function a if 
1 re ak) = f(n) + o(f(n)). Of course this doesn’t uniquely define f; we 
usually choose a particular such f which has a simple form. From (2.11) and (2.14), 
it follows that the average order of the Euler p-function is => a 
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We now turn to Theorem 2.2. 


Proof of Theorem 2.2. From the definition of the Mébius function, it follows that 


1, if is square-free; 
p(n) = (2.15) 
0, otherwise. 


Thus, letting 
An = {j € [n] : 7 is square-free}, 


we have 
|An| = >) #7). (2.16) 
j=l 


To prove the theorem, we need to show that 


Al & 
im 42! — © (2.17) 


n>oo Nn n2 


We need the following lemma. 


Lemma 2.2. 


wn) = D7 wk). 


k2\n 


Proof. Let A(n) := en i(k). If n is square-free, then the only integer k that 
satisfies k*|n is k = 1. Thus, since (1) = 1, we have A(n) = 1. On the other 
hand, if 7 is not square-free, then n can be written in the form n = m?1, where 
m > 1 and / is square-free. Now k?|m?7/ if and only if k|m. (The reader should 
verify this.) Thus, we have 


Aw) = Diu) = DF e&) = De® = (He * Dim = 0, 


k2\|n k2|m21 k|m 


where the last equality follows from (2.5). The lemma now follows from (2.15). O 


Using Lemma 2.2, we have 


VHD =D Me. (2.18) 
j=l 


J=1 Rj 
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If k? > n, then j2(k) will not appear on the right hand side of (2.18). If k? < n, 
then jz(k) will appear on the right hand side of (2.18) [4] times, namely, when 
j =k’, 2k?,...,[%]k?. Thus, we have 


Yew=L Daw = ire ] wk) = D (gal e® = 


J=1 kj ma k<[n2] 


4 a nO 


k<{n?] k<[n?] 


(te — 5) uk). (2.19) 


Since each summand in the second term on the right hand side of (2.19) is bounded 
in absolute value by 1, we have 


| >> ([sI- a) Hk) <n?. (2.20) 
k<[n2] 


It follows from (2.16), (2.19), and (2.20) that 


_— |Anl ya Hk) 
Bus n ~ y k2 
k=l 
Using this with (2.14) gives (2.17) and completes the proof of the theorem. Oo 


We now give a heuristic probabilistic proof and a rigorous probabilistic proof of 
Theorem 2.1. In the heuristic proof, we put quotation marks around the steps that 
are not rigorous. 


Heuristic Probabilistic Proof of Theorem 2.1. Let {px}?, be an enumeration of 

the primes. In the spirit described in the first paragraph of the chapter, if we 

pick a positive integer “at random,” then the “probability” of it being divisible by 

the prime number pj; is = (Of course, this is true also with px replaced by an 

arbitrary positive integer.) If we pick two positive integers “independently,” then the 

“probability” that they are both divisible by p, is ++ = a by “independence.” 
k 


Pk Pk 
So the “probability” that at least one of them is not divisible by px is 1 — ar The 


“probability” that a “randomly” selected TOE integer is divisible by the two 
distinct primes, p; and px, is cn = are (The reader should check that this 
“holds” more generally if p; and p, are replaced by an arbitrary pair of relatively 
prime positive integers, but not otherwise.) Thus, the events of being divisible by p; 
and being divisible by px are “independent.” Now two “randomly” selected positive 
integers are relatively prime if and only if, for every k, at least one of the integers 
is not divisible by Pre But since the “probability” that at least one of them is not 
divisible by pj, is l=— —,, and since being divisible by a prime p; and being divisible 
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by a different prime p; are “independent” events, the “probability” that the two 

“randomly” selected positive integers are such that, for every k, at least one of them 

is not divisible by px is []7@2,(1 — ap): Thus, this should be the “probability” that 
k 


two “randomly” selected positive integers are relatively prime. Oo 


Rigorous Probabilistic Proof of Theorem 2.1. For the probabilistic proof, the sec- 
ond alternative suggested in the second paragraph of the chapter will be more 
convenient. Thus, we choose an integer from [n] uniformly at random and then 
choose a second integer from [n] uniformly at random. Let Q, = [n]. The 
appropriate probability space on which to analyze the model described above is the 
space (Q, x Q,, P,), where the probability measure P,, on Q, x Q, is the uniform 
measure; that is, P, (A) = el i , for any A C Q, x Q,. The point (i, 7) € Qy x Qn 
indicates that the integer 7 was chosen the first time and the integer 7 was chosen 
the second time. Let C,, denote the event that the two selected integers are relatively 
prime; that is, 


Ch — {G, J) € Qh x Qn 7 gcd(i, j) = 1}. 
Then the probability g,, that the two selected integers are relatively prime is 


ay 


dn = Pi(Cn) = 


Let {px}f2, denote the prime numbers arranged in increasing order. (Any 
enumeration of the primes would do, but for the proof it is more convenient to 
choose the increasing enumeration.) For each k € N, let Bi. , denote the event that 
the first integer chosen is divisible by p; and let By. , denote the event that the 
second integer chosen is divisible by p;. That is, 


Bh = {G,f) € Qn Qn: peli}, Bry =f) € Qn X Qn > piel j}- 


Note of course that the above sets are empty if p, > n. The event Bi. Bil Br. = 
{G, J) € Qy xX Qy : pxli and p,|j7} is the event that both selected integers have 
Px as a factor. There are Ea! integers in Q,, that are divisible by p;,, namely, 
Pk» 2Pks**° [5] Pe: Thus, there are ei pairs (i, 7) € Qy x Q, for which both 
coordinates are divisible by p;; therefore, 


tnd 


PAB Bo) = (2.21) 


Note that U?2,(Bi, 9 Bry) = Uji (Bi, 9 Bz.) is the event that the two 
selected integers have at least one common prime factor. (The equality above 


follows from the fact that Bi. _, and B 7k are Clearly empty for k > n.) Consequently, 
C,, can be expressed as 
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ce 2 
Cr a (Ugo (Bik a Bre ) = Na (Brg a Boas 


where A° := QQ, x Q, — A denotes the complement of an event A C Qy X Qh. 
Thus, 


Pa(Ca) = Pu( Mar (Bae 1 Bay)’). (2.22) 
Let R <n be a positive integer. We have 
Nar (Base a) Bra) = Nii (Base a) Bra)’ a Ugerti (Brix A Br) 


and, of course, 7 _ (Bi. Bry)© C Of Bis N Br.,)°. Thus, 


Pu( Mat Bre 1 Bred") — Pu Veet Bre O Bred) S 
Pal ay Be Beg) ) SP Rj Bag 11-8) (2.23) 
Using the sub-additivity property of probability measures for the first inequality 
below, and using (2.21) for the equality below, we have 
n n [ap 


[o.e) 
7 1 
Pr ( Up=rR+1 (Bh. a) Bu) S >» Pa(Br-k a Boe )= a = = a 29 
k=R+1 k=R+1 k=R+1 Pk 


(2.24) 


Up until now, we have made no assumption on n. Now assume that p;|n, for 
k = 1,---, R; that is, assume that 7 is a multiple of ica Px. Denote the set of 
such n by DR; that is, 


Dr={neEN: pxl|n fork =1,---, R}. 


Recall that the event By aa B?. , is the event that both selected integers are divisible 


by k. We claim that ifn € Da, then the events {B)., 9 By. }{_, are independent. 
That is, for any subset J C {1,2,--- , R}, one has 


Pa( Meer (Brg Bex) = [| Pn(Biy 0 Bry). ifn € Dr. (2.25) 
kel 


The proof of (2.25) is a straightforward counting exercise and is left as Exercise 2.4. 
If events {Ax Cy are independent, then the complementary events { A‘, ay are also 
independent. See Exercise A.3 in Appendix A. Thus, we conclude that 


R 
P, (fy (Bre 1B? y)) = [] Pn((Biy 9 B2,)°), ifn € Dr. (2.26) 
k=1 
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By (2.21) we have P,((Bi., 9 B2.,)°) = 1- P,(B,. itl Br) = 
n. Thus, from the definition of Dr, we have 


. 1 
P, (Bly 0 B2,)°) =1-——, ifn e Dr. (2.27) 
Py 


From (2.22) to (2.24), (2.26), and (2.27), we conclude that 


R l lee) l R 
I]Ja-=j- ©) as PG) <[]a->). for R € Nandn € Dr. (2.28) 
kat Pk parti Pk k=1 Pk 


We now use (2.28) to obtain an estimate on P,,(C,) for general n. Let n > 
1 Px. Let n’ denote the largest integer in Dp which is smaller or equal to n, 
and let n” denote the smallest integer in Dr which is larger or equal ton. Since Dr 
is the set of positive multiples of Hh, Pr, we obviously have 


R R 


n>n— I] Pe and n” <n+ I] Pk: (2.29) 
k=1 k=1 


For any n, note that n? P,(C,,) = |C,,| is the number of pairs (i, 7) € Q, x Q,, that 


are relatively prime. Obviously, the number of such pairs is increasing in n. Thus 
(n')? Py (Cyr) < n? Py (Cy) < (n")? Py (Cy), or equivalently, 


(S) Pa (Cu) < P,(C,) < (—)? Par (Cur). (2.30) 


Since n’,n” € Dr, we conclude from (2.28)—-(2.30) that 


R foe) R R 
Cc 1 n+ | [pe 1 
Thee Pky mT a- — DE =) < PCr) < ( ksi Pky TY 3): 
k=1 rp k=R+1 Pk k=1 = Pk 
(2.31) 
Letting n — oo in (2.31), we obtain 
R loo) R 
1 
[[a- 2 mo < liminf P, (C,,) < lim sup Pn (Cn) = < [Ja- a, 
k=1 p k=R+1 Pk k=1 k 
(2.32) 
Now (2.32) holds for arbitrary R; thus letting R — oo, we conclude that 
jim, Pu(Cn) = Ta = (2.33) 


k=1 2) 


2 Relatively Prime Pairs and Square-Free Numbers 19 


The celebrated Euler product formula states that 


1 ee 
Trex =) a r (2.34) 


see Exercise 2.5. From (2.33), (2.34), and (2.13), we conclude that 


1 6 
hi n= li Pi(Ch) = sea 
aot gna Ce) = S09 a — G2 7 


Exercise 2.1. Give a direct proof of Corollary 2.1. (Hint: The Euler ¢-function 
o(n) counts the number of positive integers that are less than or equal to n and 
relatively prime to 1. We employ the sieve method, which from the point of view 
of set theory is the method of inclusion—exclusion. Start with a list of all n integers 
between | and 7 as potential members of the set of the ¢(7) relatively prime integers 


to n. Let {p; =, be the prime divisors of n. For any such pj, the = numbers 


Pj 2Djr-->s a pj; are not relatively prime to n. So we should strike these numbers 
from our list. When we do this for each 7, the remaining numbers on the list are 
those numbers that are relatively prime to 1, and the size of the list is (1). Now 
we haven’t necessarily reduced the size of our list to Ny := n — YS re because 
some of the numbers we have deleted might be multiples of two different primes, 
p; and p;, in which case they were subtracted above twice. Thus we need to add 


back to N, all of the rrr multiples of p;p;, fori # j. That is, we now have 
iPj 
No := Ni + er nee . Continue in this vein. 
i Pj 


Exercise 2.2. This exercise presents an alternative proof to Proposition 2.2: 


(a) Show that the arithmetic function )~ din (d) is multiplicative. Use the fact that 
¢ is multiplicative—see Exercise 2.3. 

(b) Show that >> dle o(d) =n, when n is a prime power. 

(c) Conclude that Proposition 2.2 holds. 


Exercise 2.3. The Chinese remainder theorem states that if n and m are relatively 
prime positive integers, anda € [n] and b ¢€ [m], then there exists a unique c € [nm] 
such that c = a modn and c = b mod™m. (For a proof, see [27].) Use this to prove 
that the Euler ¢-function is multiplicative. Then use the fact that @ is multiplicative 
to prove (2.7). 


Exercise 2.4. Prove (2.25). 


Exercise 2.5. Prove the Euler product formula (2.34). (Hint: Let Ng denote the set 
of positive integers all of whose prime factors are in the set { px Ya: Using the fact 
that 


1 | 
1 = rm? 
isa mao 
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for all k € N, first show that — => 1 and then show that 


1-+ 1-+ n€N> nr? 
Py Po 


Is os neNe ae for any fe N.) 


1 
ia 
7 


Exercise 2.6. Using Theorem 2.1, prove the following result: Let 2 < d € N. 
Choose two integers uniformly at random from [n]. As n — oo, the asymptotic 
probability that their greatest common divisor is d is re 


Exercise 2.7. Give a probabilistic proof of Theorem 2.2. 


Chapter Notes 


It seems that Theorem 2.1 was first proven by E. Cesaro in 1881. A good source for 
the results in this chapter is Nathanson’s book [27]. See also the more advanced 
treatment of Tenenbaum [33], which contains many interesting and nontrivial 
exercises. The heuristic probabilistic proof of Theorem 2.1 is well known and 
can be found readily, including via a Google-search. I am unaware of a rigorous 
probabilistic proof in the literature. 


Chapter 3 
A One-Dimensional Probabilistic Packing 
Problem 


Consider n molecules lined up in a row. From among the 1 — | nearest neighbor 
pairs, select one pair at random and “bond” the two molecules together. Now from 
all the remaining nearest neighbor pairs, select one pair at random and bond the 
two molecules together. Continue like this until no nearest neighbor pairs remain. 
Let M,,-2 denote the random variable that counts the number of bonded molecules. 
Let EM,,.2 denote the expected value of M,,.2, that is, the average number of bonded 
molecules. The first thing we would like to do is to compute the limiting average 
fraction of bonded molecules: lim,—+99 Maia . Then we would like to show that Moz 
is close to this limiting average with high probability as n — oo; that is, we would 
like to prove that Mni2 satisfies the weak law of large numbers. 

Of course, by definition, EM).2. = ome =o JP(Mn.2 = j), where P(Mn.2 = /) is 
the probability that M,,.2 is equal to 7. However, it would be fruitless to pursue this 
formula to evaluate E M,,.. asymptotically because the calculation of P(My:2 = j) 
is hopelessly complicated. We will solve the problem with the help of generating 
functions. 

Actually, we will consider a slightly more general problem, where the pairs are 
replaced by k-tuples, for some k > 2. So the problem is as follows. There are n 
molecules on a line. From among the n —k + 1 nearest neighbor k-tuples, select one 
at random and “bond” the k molecules together. Now from among all the remaining 
nearest neighbor k-tuples, select one at random and bond the k molecules together. 
Continue like this until there are no nearest neighbor k-tuples left. Let M,,.; denote 
the random variable that counts the number of bonded molecules, and let EM,,-, 
denote the expected value of M,,.;. See Fig. 3.1. Here is our result. 


Theorem 3.1. For each integer k > 2, 


k-1 1 k=1 7 
~ EMix 1 sl 
lim “= k exp(—2 y =) | exp(2 y —) ds := pr. (3.1) 
noo n , J 0 , J 
=! j=l 
Furthermore, se satisfies the weak law of large numbers; that is, for all € > 0, 
R.G. Pinsky, Problems from the Discrete to the Continuous, Universitext, 21 
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Fig. 3.1 A realization with n = 21 and k = 3 that gives Mz; 3 = 15 


Mizk 
nN 


Jim, P= — pel = 6) = 0. 3.2) 
Remark I. Only when k = 2 can px be calculated explicitly, one obtains pz = 
1—e~? ~ 0.865. Numerical integration gives p3 ~ 0.824, p4 ~ 0.804, ps ~ 0.792, 
Pio © 0.770, Pioo © 0.750, Piooo © 0.748, and Pio,o00 = 0.748. The expression 
Px seems surprisingly difficult to analyze. We suggest the following open problem 
to the reader. 


Open Problem. Prove that px is monotone decreasing and calculate limz_—+oo Dx- 


Remark 2. Any molecule that remains unbonded at the end of the nearest neighbor 
k-tuple bonding process occurs in a maximal row of j unbonded molecules, for 
some j € [k — 1]. In the limit as 2 — oo, what fraction of molecules ends up in a 
maximal row of 7 unbounded molecules? See Exercise 3.2. (In Fig. 3.1, numbering 
from left to right, molecules #4 and #8 occur in a maximal row of one unbounded 
molecule, while molecules #15, #16, #20, and #21 occur in a maximal row of two 
unbounded molecules.) 


Proof. For notational convenience, let H® = EM,;x and L® = EM? ,. 10 prove 
the theorem, it suffices to show that 


EM) = H® = ppn+o(n), asn > ~, (3.3) 
and that 
EM, = L© = p?n? + o(n?), asn > on. (3.4) 


This method of proof is known as the second moment method. It is clear that (3.1) 
follows from (3.3). An application of Chebyshev’s inequality shows that (3.2) 
follows from (3.3) and (3.4). To see this, note that if Z is a random variable with 
expected value EZ and variance o7(Z), then Chebyshev’s inequality states that 


o°(Z) 


P(|Z - EZ| 28) <= — 


for any 6 > 0. 


Also, o?(Z) = EZ? — (EZ)*. We apply Chebyshev’s inequality with Z = Mask : 
Using (3.3) and (3.4), we have 


(k) 


n 


EZ = = px + o(1), asn > 00, (35) 
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and 


a _ ny 


oZ)= = pe + o(1) — (pe + 0(1))” = o(1), asin > 00. 


Thus, we obtain for all 5 > 0, 


Me a 
pai cicee }28).< °C), asn 00, 
n 
or, equivalently, 
Musk = 
lim P(|—~ — "| > 8) = 0, forall § > 0. (3.6) 
noo i] 


We now show that (3.2) follows from (3.3) and (3.6). Fix « > 0. We have 


k k k k 
Mak Ma:k Hi ) Hi } Mz; ak H; ) Hi ) 
|—>— — px| = | = + — pr| S| | | — pri. 
nN nN nN nN n 


forn > n,. Thus, 
(k) 

= Mik _ Hn € 

aa? [24 


For sufficiently large n-<, < 5° 
for n > n,., a necessary condition for Ma Maik 


Consequently, 


M,. Mx. HH” 
P(—* — pl = 6) < P= - 4 | = 5), forn > ne. 
n n n 2 


Now (3.2) follows from this and (3.6). 

Our proofs of (3.3) and (3.4) will follow similar lines. Before commencing with 
the proof of (3.3), we trace its general architecture. Only the first step of the proof 
involves probability. In this step, we employ probabilistic reasoning to produce a 
(k) ,H®, Pere fhe In this form, 
the equation is not useful because as n — ox, it gives HW” in terms of a growing 
number of its predecessors. However, defining s© = Yi =0 a and using the 


recursion equation that gives H® in terms of H, 


abovementioned recursion equation, we find that s& is given in terms of only 
two of its ae We then construct the generating cea g(t) whose 
coefficients are {s r—o: Using the recursion equation for 5© , we show that g 
solves a linear, first fe differential equation. We solve this differential equation 
to obtain an explicit formula for g(t). This explicit formula reveals that g possesses 
a singularity att = 1. Exploiting this singularity allows us to evaluate lim, +o Sy 


- (k) (k) 
and then a simple observation allows us to obtain lim,—,95 Hn from lim,-+99 Sn 


We now commence with the proof of (3.3). Note that if we start within < k 
molecules, then none of them will get bonded. Thus, 
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H® =0, forn =0,...,k—1. (3.7) 


We now derive a recursion relation for Hi”. The method we use is called first step 
analysis. We begin with a line of n > k unbonded molecules, and in the first step, 
one of the nearest neighbor k-tuples is chosen at random and its k molecules are 
bonded. In order from left to right, denote the original n — k + 1 nearest neighbor 
k-tuples by {B; mae If B; was chosen in the first step, then the original row now 
contains a row of j — 1 unbonded molecules to the left of the bonded k-tuple B; 
and a row of n + 1 — j —k unbonded molecules to the right of B;. To complete the 
random bonding process, we choose random k-tuples from these two sub-rows until 
there are no more k-tuples to choose from. This gives us the following formula for 
the conditional expectation of M,,.; given that B; was selected first: for n > k, 


E(M,;4|B; selected first) =k+E(Mj—14+Mn+1—j—ea)=kK+ Hy +H). 
(3.8) 


Of course, for each j € [n — k + 1], the probability that B; was chosen first is 
a" Thus, we obtain the formula 
n—-k+1 
EM = H® = > P(B; selected first) E(M,;x|B; selected first) = 
j=l 
I n—k+1 
k k 
Yk + He, + Aj). nk. 


n—-k+1 4 
jimt 


We can rewrite this as 


n—k 


ys a n>k. (3.9) 


j=0 


2 
H® =k +—_ 
i Vek 


The above recursion equation is not useful directly because it gives H, Me) in terms 
of n—k +1 of its predecessors; we want a recursion equation that expresses a given 
term in terms of a fixed finite number of its predecessors. To that end, we define 


n 


san, (3.10) 
j=0 
Substituting this in (3.9) gives 
2 
H® =k+——_S©,, n=k. (3.11) 


n—-k+10"-* 
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Writing (3.7) and (3.11) in terms of {s® eo», we obtain 


n=0? 


S® = 0, forn =0,...,k—1, (3.12) 
and 
s®_§® 2442 _s® ask. (3.13) 
n n—-1 n—k + 1 n—k? _ 


This recursion equation has the potential to be useful since it gives s© in terms of 
only two of its predecessors—S\"), and S Of course, we have paid a price—we 
are now working with sh instead of He. but this will be dealt with easily. For 


convenience, we drop the superscript k from s&, A i) , and L& ) for the rest of the 
chapter, except in the statement of propositions. We rewrite (3.13) as 


(n—k+1)S, = (n—k +1)Sy-1 + 2Sp-~ +k(n-k +1), n>k. (3.14) 


We now define the generating function for {S,,}°2.) and use (3.14) to derive a 
linear, first-order differential equation that is satisfied by this generating function. 
The generating function g(t) is defined by 


a) =) Sat” = Do Sat”, (3.15) 
=k 


n=0 


where the second equality follows from (3.12). From the definitions, it follows that 
H, <n, and thus S, < Sn(n + 1). Consequently, the sum on the right hand side 
of (3.15) converges for |t| < 1, with the convergence being uniform for |t| < p, for 
any p € (0, 1). It follows then that 


Co 
Os > ns. lt 21. (3.16) 
n=k 
Multiply equation (3.14) by t” and group the terms in the following way: 
nS,t" —(k —1)S,t"” = (n—1)S,-1t" — (kK —2)S,—-1t" + 28,40" +k(n—k +10". 


Now summing the equation over all n > k, and appealing to (3.15), (3.16), 
and (3.12), we obtain the differential equation 


tg’ (t) — (k — g(t) = 07 ¢'(t) — (k — 2)tg(t) 


+ 20% g(t) + kt) nt"! —k(k-1) Sot". (3.17) 
n=k n=k 
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ko, 
7. it follows that 
n—1 _ y — (=t)ktk7" 44k : . 
at =(~ “y= =) Using these facts and doing some algebra, 
which leads to many cancelations, we obtain 


Since )°2, nt"! is the derivative of °°, 2" = 


ke Sone 1 —k(k - noes = a (3.18) 


n=k 


Substituting this in (3.17), and doing a little algebra, we obtain 


&=1)—(—2)t +2 kerk 
(1 —1) 8) + Ge O<t<l. (3.19) 


Note that we have excluded t = 0 because we have divided by f. 

There are two singularities in the above equation—one at t = 0 and one att = 1. 
The singularity at ¢ = 0 is removable; indeed, g(0) = 0 so the first term on the right 
hand side of (3.19) can be defined at 0. The singularity at 1, on the other hand, is 
authentic, and actually contains the solution to our problem—vwe will just need to 
“unzip” it. 

The linear, first-order differential equation in (3.19) is written in the form g’(t) = 
a(t)g(t) + b(t), where 


g(th= 


(k — 1) — (k —2)t + 20* kek 
i (1 —1) » = Goa 


(3.20) 
Let € € (0, 1) and rewrite the differential equation as 
(g(t)e-F a(s) ay = b(t)e~/« a(s) ds” 
Integrating from € tot € (e, 1) gives 
t g Ss 
g(tye fe" = gle) + / b(sje— £24" ds, t € (e, 1), 
which we rewrite as 


t 
g(t) = gee ne +f b(s)els ae as. te (e, 1); (3.21) 


i . F k-1 
Since lim;,9 ta(t) = k — 1, there exists a f) > O such that a(t) < —-, for 
0 <t < to. Thus, for € < fo, one has 


1 
a a(r)dr < ae ane dr = (Oy k-F, 
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By (3.15) we have g(e) = O(e*) as € — 0. Therefore, 


‘a(r)dr 


ar fh aly t nip\dr t 
lim g(e)e = lim g(ejel” ary dr olig@)4r <= elig 4" tim g(e)(2)*-2 =0. 
e>0 e>0 «>0 € 


Thus, letting « — 0 in (3.21) gives 
t t 
g(t) -| b(sjes 4" ds, O<t <1. (3:22) 
0 
Using partial fractions, one finds that 
k= )=-h=2)r k=! 1 


r(1—r) oF cee 


We also have 
peal 


eee ae 


Thus, we can rewrite a(r) from (3.20) as 


1 3 


a(r) = fot 2m tt eg), 
We then obtain 
k- fg 
i a(r) dr = (k — 1) logt — 3log(1 -j-35 5 = 
j=l je 
and thus 
t k—-1 tl k—-1 sf 
als a(r)dr = (0 al =f) 30 20) 7)(s! kd — spe pay as a) (3.23) 


Substituting this in (3.22) and recalling the definition of b from (3.20), we obtain 


pk-1 wh sl 
g(t) = aap? 2s a5 [ ke2Xi= F ds. (3.24) 


We see that g has a third-order singularity att = 1. We proceed to “unzip” this 
singularity to reveal the answer to our problem. 

We have the following proposition which connects the limiting behavior of H), 
with that of S,,. 
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Proposition 3.1. 


if and only if 
ae: 
lim — = = 
n>oo n2 2 
Proof. The proof is immediate from (3.11). Oo 


And we have the following proposition which connects the limiting behavior of 
S;, with the singularity in g att = 1. 


Proposition 3.2. [f 


then 
lim(1 —t)P g(t) = 2L. 
to 
Proof. Since limy—oo 5 = L, we also have limy—+o aD = L.Lete > 0. 
Choose 1g such that Presa — L| < «, forn > no. Then recalling (3.15), we have 
no Co no Co 
Yo Snt™+(L-e) D> n(n-1)t" < g(t) < DO Sut" +(Lt+e) DO n@—be". 
n=0 n=no+l n=0 n=no+l 
(3.25) 
Now 
CO [o.e) 
1 oP 
—)r=P2 t” Wi 42 it ; 
PO a ae 0 8 a eer Mea cers 
n=0 n=0 
so 
[o.@) n 
n at : n 
n=not+l n=0 


Substituting this latter equality in (3.25), multiplying by (1 —1)°, and letting t > 1, 
we obtain 
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2L— < lim inf. —t)>g(t) < limsup(1 — ft)? g(t) < 2L + 2e. 
t> tol 


As € > 0 is arbitrary, the proposition follows. Oo 


In order to exploit Propositions 3.1 and 3.2, we will establish the existence of the 
limit lim) oo 54. 


(k) 
Proposition 3.3. limy,—o0 Su, exists. 


Proof. Rewriting the recursion equation for S,, in (3.13) so that only S,, appears on 
the left hand side, then dividing both sides by n* and subtracting ast from both 
sides, we have 


Sn Sn—-1 = k 4 Sn—-1 Sn—-1 2Sn—k _ 

n (n—1) rn? n? (n—1)2) n2(n—-k4+1) — 

k 2n—1 2Sn—k 

nm n(n — 1) Del m(n—k+1)_ 

k 2n—1 28n—-1 2 

Sepa EE aE _ H,_ ++ Hy-1) = 
n2 n2(n—1)2 aa ks 1) ne(n-k+ 7h eyes Eat) 


k (Qk—5)n+3-—k 2 
nm n(n—12(n—-k+1)" n(n—-k +1) 


(An—K+1 “pate se Hy,-1). 
(3.26) 
As already noted, from the definitions, we have H; < / and S; < S1(1 + 1). Thus, 
there exists aC > 0 such that 
| (2k —5)n+3-—k 
n(n — 1)?(n—k +1) 
2 
n?(n—k + 1) 


C 
|Sn—1 a and 
n 
C 
(Anette: + An-1) < = (3.27) 


This shows that the right hand side of (3.26) is O(-5) and thus so is the left hand 


side. Consequently, the telescopic series )~°~, (23 - ait) is convergent. Since 


Sn wor/Sj Sj 


m2 Ga- Gap) 
j= 
we conclude that lim,—+o *y exists. O 


By Propositions 3.1 and 3.3, € := limy—oo Hn exists. Then by Propositions 3.1 
and 3.2 (with L = ), it follows that 


lim (1 —1tPg(t) = £. 
t> 
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However, from the explicit formula for g in (3.24), we have 


k- 1 k-1 si 
lim(l = 1)°g(t) = ke Li tf Lil FT ds = py. 
t—> 0 


Thus, £ = px, completing the proof of (3.3). 

We now turn to the proof of (3.4). We derive a formula by the method used to 
obtain (3.8). Recall the discussion preceding (3.8). Note that conditioned on B; 
being chosen on the first step, the final state of the 7 — 1 molecules to the left of B; 
and the final state of the n + 1— j —k molecules to the right of B; are independent 
of one another. Let Mj—1.4;1 and Mn+41—j—«;x;2 be independent random variables 
distributed according to the distributions of Mj;—1,, and M,,41—j-x«:x, respectively. 
Then similar to (3.8), we have 


E(M,,,.|B; selected first) = E(k + Mj-1:k:1 + Mnti—j—kk2) = 
ke + Lj + Lngi—j—e + 2kAj—-1 + 2k Ang i—j—e + 2Hj-1An4i—j—z, (3.28) 


where the last term comes from the fact that the independence gives 
EM j-1:k:1 Mn4i—j-kk22 = EMj-1k:1 EMn4i-j-k:k:2- 


Thus, similar to the passage from (3.8) to (3.9), we have 


2 n—k Ak n—k 2 n—k 
L,=k? + ——___— L; + —— HA; + ——— H; An—Kj;, 
eon a jae lees expe ‘ cae 

j=0 j=0 j=0 
fork >n. (3.29) 


We simplify the above recursion relation by defining 


Ry = Ly. 
j=0 


Of course, we have L, = 0, forn = 0,...,k — 1, and thus, 
R, = 0, forn =0,...,k—1. (3.30) 


Recalling (3.10), we can now rewrite (3.29) in the form 


a 
2 - % 
Rn= Rn the + — Rn-Kt+ Sn—-k + ———— Ps Hj An—k-j> n=k. 
j=0 


4k 
—-k+1 n—-k+1 na-k+1& 
(3.31) 
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Proposition 3.4. 


1 Ow pW Pi 
im, na 2 Ay kj = 6° 
= 


Proof. Let € > 0. Since lim,—+o Hn = px, we can find ann, such that (py —€)n < 
Hy, < (pe + ©)n, forn > n,. Thus 


(me-Y Dl ja-k-s)+ > Hj H,-‘-j < 


Ne<j <n—ne—k O<j <ne.n—ne—k <j <n—k 


n—k 
> Hj Ay—K-j X< 
j=0 


(eter > ja-k-j)t+ Ly Hj Hn-K-j. 
Ne<j <n—ne—k OS j<nesn—ne—k <j <n—-k 
(3.32) 
Since H; < /, for all j, we have 


Aj Ay-p—j < 2(ne + Inn. (3.33) 


O<j <ne,n—ne—k <j <n—k 
(There are 2(n, + 1) summands on the left hand side of (3.33), and each summand, 


H; H,-«-;, is less than or equal to n.n.) Using the identity YS — en(n + 
1)(2n + 1), we have 


> J@-k-fp=a-h YO j- YO PH 


1<j <n—n,.—k 1<j <n—n.—k 1<j<n—n_.—-k 


un —k)\n-—n, —k-—1)(n-—n, —k)- 


1 1 
=(n—n.—k—1)(n—n.—-k)(2(n—n.-k —1)+ 1) = ay +o(n?), asn > oo. 


6 
(3.34) 
Of course, 
1 
i(n—-k—j)< j < —nn, (n¢ ; : 
» j(n j)<n > j < 5nne(ne + 1) (3.35) 
I<j<ne 1<j <ne 
From (3.32)-(3.35), we conclude that 
1 n—k n—k l 
¢(Pk—€)S limint — ) | Hj Hn1—j5 lim sup = DA naj S G(R +6)” 
j=0 j=0 


which completes the proof, since € > 0 is arbitrary. O 
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We can rewrite (3.31) as 


n—-k+3, 
n—-k+1 


n — 


2 
R,- +k a ar Ly- - Ly- 
it tee if he eee 


Ak n—k 
———_ §,,- ———= A; An—K-;- 
n-k+1 tt ase ae 


Since L; < j* and S; < > 1 i(j + 1), we conclude from Proposition 3.4 that R, 
satisfies an equation of Fe form 


—k+3 Wi, ; 
Roe A Ry + Wh, where W, satisfies lim — = a (3.36) 


In Exercise 3.1 the reader is asked to show ae if for some no, the positive sequence 


ao satisfies R, < "= ps +3 R,— (tur a, = = 2k R, 1 + cn’), then 


lim sup,,_. 66 Ry < c (lim inf, so Ry > c). Using this sath (3.36), we conclude 
that 


im a = ee (3.37) 
Writing (3.31) in the form 
2 4k aa 
Ly =k? + Woe aka ae ys n>k, 


(3.38) 


dividing both sides of this equation by n7, and using (3.37), Proposition 3.4, and the 
fact that S,, is on the order n?, we conclude that 


Ge = ae ee 
This gives (3.4) and completes the proof of Theorem 3.1. O 


Exercise 3.1. Show that if for some 7o, the positive sequence {Ry }ron, satisfies 


k+3 -k+43 p R 
R, < = TT Rn 1 +n? (Ry > wt Rn-1 + cn’), then limsup,,.. 7# < ¢ 


(lim inf, 59 Ry >c). 


Exercise 3.2. Any molecule that remains unbonded at the end of the nearest neigh- 
bor k-tuple bonding process occurs in a maximal row of 7 unbonded molecules, 
for some j € [k — 1]. In the limit as n — oo, what fraction of molecules ends up 
in a maximal row of j unbounded molecules? Let’s denote these fractions by q;.;, 


J € [k — 1]. Of course yi dk:j = 1— De. 
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(a) Let k > 3 and fix 7 € [k — 1]. Consider the following bonding process: 


(b 


(d 


(e 


we 


wm 


wm 


wm 


implement the bonding of nearest neighbor k-tuples as described in the chapter. 
When this process terminates, bond all the unbonded molecules that occur in 
a maximal row of j unbonded molecules, but leave untouched all unbonded 
molecules that occur in a maximal row of i unbonded molecules, for some 
i # j. Let M,.x,; denote the number of bonded molecules at the end of 
the process, and let He) — EM,;x,;- Let sho — er He?, Convince 
yourself that Ak 50 satisfies the recursion equation (3.9) and that it 
satisfies the boundary condition (3.7) with one change, namely He = ds 
instead of H*/) = 0. Thus, {sei ) poe | Satisfies the recursion equation (3.13), 
and in place of the boundary condition (3.12), it satisfies the boundary condition 
S&P) =0,n=0,...,7-1; Se) = j,n=j,...,k-1. 

Lee) = ya see ¢” denote the generating function for ps : 5 
Show that g; solves the differential equation 8; (t) = a(t)g;(t) +b; (0), where 
a is as in (3.20) and 


=fhq=le je Siege a 
=o 


b(t) =b(@)+ 


with b as in (3.20). 
In particular, note that b,_; = b; therefore, g,— satisfies the same differential 
equation satisfied by g. Thus, (3.21) holds for g,_1; that is, 


t t t 
ge-i(t) = ge-i(ejele We" 4 1 b(s)els 44" ds. t € (e,1). 


Use the fact that g,_,(€) = (k — Ie! + O(e*), as € > 0, along with (3.23) 
to show that 


> = iam 2 k-1 
. L a(r)dr _ t —2(t+ Cspot a ) 
lim &k-1(€)e (k — 1) = é . 
Use (c) to show that 

Dksk-1 = (k — Le 2043-4), 


In particular then, g3. = 2e7* ~ 0.0996 ~ 0.100, and consequently g3,; = 
1 — p3 — 432 © 1 — 0.8237 — 0.0996 = 0.077. 

It is well known that limy+o ( )-/—-, +—log 1) exists; the limit is called Euler’s 
constant and is denoted by y. One has y ~ 0.5772. For a proof, see, for 
example, [25]. Show that 


—2y 
dksk—-1 ~ e YY, ask > o. 
k-1 
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(f) For j € [k — 2], one obtains 
t t t 
g(t) = gel Oe 4 / b,(s)els 4" ds, t Ee (e,1). (3.39) 


Show that since gj; (€) = je’ + O(€/t'), as € > 0, one has 


lim g; (e)ele ar)dr — 9, 
e>0 


On the other hand, since b; appears instead of bj; = b, show that one also has 


f t 
iim f bj (sels ar)dr ds — oo, 


t 
You are invited to show that the appropriate terms in gj (eel adr and 


f bj (s)els adr ds cancel each other out and to obtain a finite limiting 
expression as € — 0 on the right hand side of (3.39). This limiting expression 
is then also g;(t). One then has lim,.;(1 — 1)? g(t) = pe + x:;, which gives 
an explicit formula for qx;;. The above analysis gets more involved the smaller 
j is. Try it first for 7 = k —2. 


Chapter Notes 


The calculation of (3.1) in the case k = 2 goes back to an article by the Nobel Prize 
winning chemist Flory in 1939 [21]. The problem was rediscovered by Page, who 
obtained the asymptotic behavior for the mean and variance in the case k = 2 [28]. 
The method used there does not generalize to k > 2. Theorem 3.1 seems to be new. 
A continuous space version of this problem was considered by Rényi [31]. 


Chapter 4 
The Arcsine Laws for the One-Dimensional 
Simple Symmetric Random Walk 


The simple, symmetric random walk {S,}°2) on Z starts at step n = 0 at 0 € Zand 
at each successive step jumps one unit to the right or left, each with probability 5. 
The random walk is called “simple” because the sizes of its jumps are restricted to 
the set {1,—1}. One way to realize this random walk is as follows. Let {Xn }°2, 
be an infinite sequence of independent, identically distributed random variables 
distributed according to the Bernoulli distribution with parameter 5; that is, P(X; = 
1) = P(X; =-l)= 5. Now define Sp = 0 and S,, = Ss X;,n=1. 

We begin with a fundamental fact about the simple, symmetric random walk 
on Z. 


Proposition 4.1. 


P(limsup S, = co and liminf S, = —oo) = 1. (4.1) 
Anco 


noo 


Remark I. A moment’s thought shows that (4.1) is equivalent to the statement that 
the random walk is recurrent; that is, with probability one, {S,,}°29 visits every site 


in Z infinitely often. 


Remark 2. One can consider a simple, symmetric random walk {S,,}°2.) on Zé, the 
d-dimensional lattice—at each step it jumps in one of the 2d directions with prob- 
ability ve Again, the random walk is called recurrent if with probability one every 
site is visited infinitely often. It is called transient if P (lim, 99 |S,| = 00) = 1. In 
1923, G. Polya proved the quite surprising result that this random walk is recurrent 
in two dimensions but transient in three or more dimensions. For a proof of this, see, 
for example, [15]. 


Proof. By Remark | above, to prove the proposition, it suffices to prove that with 
probability one, the random walk visits every site in Z infinitely often. Let p denote 
the probability that the random walk {S,}°2, ever returns to its starting point 0. 
We will show that p = 1. Let No denote the number of times the random walk is 
at 0 after time n = 0. Then of course, P(No = 0) = 1 — p. Now let’s calculate 
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P(No = 1). In order to have No = 1, the random walk must return to 0 and then 
never return to 0 again. The probability of returning to 0 is p. If the random walk 
returns to 0, it continues independently of everything that has already transpired. 
Thus, conditioned on returning to 0, the probability that the random walk does not 
return to 0 again is 1 — p.So P(No = 1) = p(1 — p). Continuing with this line of 
reasoning, we obtain 


P(No =n) = p"(1— p), n =0,1,.... 


If p = 1, it follows from the above reasoning that P(Njp = oo) = 1; that 
is, with probability one, the random walk visits 0 infinitely often. If p € (0,1), 
then the above calculation shows that No is distributed according to the geometric 
distribution with parameter p. For p € (0,1), the expected value ENo of No is 
given by 


EN = Yi nP(No =n)= So np" —p) = pa —p)) np"! a 


n=0 n=0 n=0 


en Te -49 
pl ~ Pp ( DP”) = pil OG ag =a (4.2) 


n=0 


(The term by term differentiation above is permitted because for any po < 1, the 
series is uniformly absolutely convergent over p € [0, po].) Of course, if p = 1, 
then ENo = oo. Thus, the formula for E No in (4.2) also holds if p = 1. 

We now calculate E No in a different way. Let 1,5 9; denote the indicator 
random variable that is equal to 1 if S, = 0 and is equal to 0 otherwise. Then 
No, the number of times the random walk returns to 0, can be represented as 


Co 
No = Do lts,=0}- 
n=1 


By the linearity of the expectation and the nonnegativity of the summands, we 
conclude that 


EN) = > P(S, = 0), (4.3) 
n=1 


since Els, 9} = 0- P(S, #0) + 1- P(S, = 0) = P(S, = 0). 

Since the random walk starts at 0, it can only return to 0 at even times; thus, 
P(Son+1 = 0) = 0. Since the random walk has two equally likely choices at each 
step, there are 27” equally likely paths that the random walk can traverse during its 
first 2n steps. Now one has S2, = 0 if and only if from among the first 2n jumps, n 
of them were to the right and n of them were to the left. There are (°") such paths; 
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thus, 


2n 
P(So, = 0) = (a). (4.4) 


Using Stirling’s formula, namely, n! ~ n"e~" /27n as n — oo, we have 


(*") (2n)! (2n)?"e~2" /Aor 7 1 
nh _ eas = , asn > Ww. (4.5) 
Q2n (n!)222" n2ne-2n (27n)22" /7n 


Since paeee, a = 00, it follows from (4.3)-(4.5) that E No = oo. In light of (4.2), 
we conclude that p = 1. 

We have shown that with probability one, the random walk returns to 0. 
Upon returning to 0, the random walk continues independently of everything that 
transpired previously; thus, in fact, with probability one, the random walk visits 0 
infinitely often. From this, it is easy to show that in fact with probability one the 


random walk visits every site infinitely often. We leave this as Exercise 4.1. Oo 


Define 


Ty = inf{n > 0: S, = 0}. 


The random time 7p is called the first return time to 0. By Proposition 4.1, it follows 
that P(To < oo) = 1. However, perhaps surprisingly, one has E 7) = 00; the reader 
is guided through a proof of this in Exercise 4.2. This result suggests that there is 
quite some tendency for the random walk to take a long time to return to 0. In this 
chapter we present two results which give vivid expression to this phenomenon. 

The arcsine distribution will figure prominently in the results of this chapter. The 
distribution function for this distribution is defined by 


2 
Farcsin(X) = = aresin af 0 xx 1. 


The corresponding density function foresin(x) = F’ 


arcsin 


(x) is given by 


1 1 
Saresin(X) = —————., 0< x <l. 
mw /x(_—x) 


Our first theorem concerns the random time 


LO”) = max{k <2n: Sy = 0}, 


which is the /ast return time to 0 up to step 2n. By parity considerations, Le can 
take on only even values. 
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Theorem 4.1. 


ea ems 
P(LY” = 2k) = ae aa k = {0,1,...,n}. (4.6) 
Furthermore, 
(2n) 2 
lim P(—2~ <x) = —arcsin./x, 0< x <1. (4.7) 
n—>0o 2n 1 


Remark. This theorem highlights the tendency of the random walk to take a long 
time to return to 0. Indeed, since the density firesin(x) blows up at x = 0,1, it 
follows from (4.7) that for large n the most likely epochs & for the last visit to 0 up 
to time 2n are those satisfying k = o(n) or k = 2n — o(n), that is, those epochs 


at the very beginning or at the very end of the trajectory. Since 2 arcsin v2 = 5, 


from (4.7) it also follows that for large n, there is a probability of about 5 that a 
random walk trajectory of 2n steps will never return to 0 during the second half of 
its life. 


Our second theorem concerns the random variable Ox, which should be thought 
of as the number of steps kK € [2n] at which the random walk is positive 
(or nonnegative). Of course, the number of steps between | and 2n that the random 
walk is positive is usually not equal to the number of steps that it is nonnegative. 
In order to obtain an exact result in closed form for all m, we need to work in a 
symmetric setting. Therefore, if the random walk is equal to 0 at some step 2k, we 
classify that step as “positive” if the previous step was positive and “negative” if the 
previous step was negative. That is, 


OF, = |{k € [2n] : S, > O or Sy = O and S,_; > 0}. 


+ 
Om 


We call oO; the occupation time of the positive half line up to time 2n. Then = 
gives the fraction of steps among the first 27 steps that the random walk is in the 
positive half line. Note that by parity considerations, Ox, can only take on even 
values. 


Theorem 4.2. 
2k) (2n—2k 
P(O;, = 2k) = Ge), k = {0,1,...,n}. (4.8) 
Furthermore, 


OF, Bot 
lim PCs " <x) = —aresin,/x, 0<x <1. (4.9) 
n 4 


noo 
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Fig. 4.1 A random walk path of length 17 


Remark I. Since the density faresin(x) takes on its minimum at x = 5, and since 
it blows up at x = 0,1, it follows that for large n the most likely percentages 
of time that a random walk trajectory is nonnegative are around 0% and 100%, 
while the least likely percentage is around 50 %! To put it in a different way, if two 
players bet a dollar each on a succession of fair coin flips, then after a long time it 
is overwhelmingly more likely that one of the players was leading almost the whole 
time than that each player was leading about half the time. This result even more 
vividly highlights the tendency of the random walk to take a long time to return to 0. 


Remark 2. Let OF, = {k € [n] : Sx = 0} denote the number of visits to 0 of the 
0° 


2n 
? 


random walk up to step [27]. It is not hard to show that the random variable =~ 
denoting the fraction of steps up to 2 at which the random walk is at 0, converges 
to 0 in probability; that is, 


03, 
2n 


lim P( >e) =0, foralle > 0. (4.10) 
noo 

We leave this as Exercise 4.3. In light of this, it follows that (4.9) would also hold if 
we had defined OF, in an asymmetric fashion as the number of steps up to [27] for 
which the random walk is nonnegative: |{k € [2n] : Sx > 0}. 


Our approach to proving the above two theorems will be completely combi- 
natorial rather than probabilistic. Generating functions will play a seminal role. 
A random walk path of length m is a path {x ; "10 Which satisfies 


Xo = 0; 
(4.11) 
Xj —Xj-1 = +1, J € [m]; 


See Fig. 4.1. Since a random walk path has two choices at each step, there are 2” 
random walk paths of length m. The probability that the simple, symmetric random 
walk behaves in a certain way up until time m is simply the number of random walk 
paths that behave in that certain way divided by 2”. 
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Fig. 4.2, A Dyck path of length 16 


Our basic combinatorial object upon which our results will be developed is the 
Dyck path. A Dyck path of length 2n is a nonnegative random walk path {x ; yr of 
length 27 which returns to 0 at step 2; that is, in addition to satisfying (4.11) with 
m = 2n, it also satisfies the following conditions: 


xj 20, j € [2n]; 
(4.12) 
X2n = 0. 


See Fig. 4.2. We use generating functions to determine the number of Dyck paths. 
Let d,, denote the number of Dyck paths of length 2n. We also define dp = 1. 


Proposition 4.2. The number of Dyck paths of length 2n is given by 


1 2 
dy, = ( ") n>1. 
n+1\n 


Remark. The number C,, := <q (_) is known as the nth Catalan number. 


Proof. We derive a recursion formula for {d,}°2.). A primitive Dyck path of length 
2k is a Dyck path {x; a of length 2k which satisfies x; > Ofor j = 1,...,2k—1. 
Let vz denote the number of primitive Dyck paths of length 2k. Every Dyck path 
of length 27 returns to 0 for the first time at 2k, for some k € [n]. Consider a Dyck 
path of length 27 that returns to 0 for the first time at 2k. The part of the path from 
time 0 to time 2k is a primitive Dyck path of length 2k, and the part of the path from 
time 2k to 2n is an arbitrary Dyck path of length 2n — 2k. (In Fig. 4.2, the Dyck 
path of length 16 is composed of an initial primitive Dyck path of length 6, followed 
by a Dyck path of length 10.) This reasoning yields the recurrence relation 


d, = * vpdn—-k, N=. (4.13) 
k=1 


Now we claim that 


vy = de, k= 1. (4.14) 
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Indeed, a primitive Dyck path {x ; Poy must satisfy x; = 1,x,; > 1, for j € [2k—2], 


Xox—1 = 1, Xo = 0. Thus, letting yj = xj4;-—1,0 < j < 2k — 2, it follows 
that {y; a is a Dyck path. Of course, this analysis can be reversed. This shows 
that there is a 1-1 correspondence between primitive Dyck paths of length 2k and 
arbitrary Dyck paths of length 2(k — 1), proving (4.14). From (4.13) and (4.14) we 


obtain the Dyck path recursion formula 
d=) desde (4.15) 
k=1 
Let 


D@y= >) a2" (4.16) 


n=0 


be the generating function for {d,}°2. Since there are 27” random walk paths of 
length 2n, we have the trivial estimate d, < 22” = 4". Thus, the power series 
defining D(x) is absolutely convergent for |x| < i The product of two absolutely 
convergent power series }°7o.9 dnx” and S°P29 byx” is Yor Cx”, where cy = 
ee a;b,—;. Thus, if in (4.15), the term d,_; were d;, instead, and the summation 
started from k = 0 instead of from k = 1, then we would have had D?(x) = D(x). 
As it is, we “correct” for these deficiencies by multiplying by x and adding 1: it is 
easy to check that (4.16) and (4.15) give 


D(x) = xD?(x) + 1. (4.17) 


Solving this quadratic equation in D gives D(x) = 1#¥b-* ae Since we know 
from (4.16) that D(O) = 1, we conclude that the generating function for {d,,}2.9 is 
given by 


im Jiao 1 
Dig = ia (4.18) 
2x 4 


Now (1 — 4x)?|x=0 = 1, ((1 — 4x)2)/|y=0 = —2 and 
n—-1 


1 1 n 1 n 4 
(CU — 4x) 2)! Neo = ee [es -l= 
= 


2 2n—1 
_ , forn > 2; 
2n—1 n 


a"(2n—2)! 
nl2"—l(n— 1)! 
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thus, the Taylor series for V1 — 4x is given by 


= Wnt 
JT 4x =1-2x- > (” )s (4.19) 


2n—1 n 
n=2 

The coefficient of x"*! in (4.19) is —45 ees | = - 5 (-"). Using this along 

with (4.18) and (4.19), we conclude that 

CO 
1 2n 1 

D(x) = tele 4.20 
(x) Yen?) Ink <3 (4.20) 
From (4.20) and (4.16) it follows that d, = a a Oo 


The proof of the proposition gives us the following corollary. 


Corollary 4.1. The generating function for the sequence {dy}°2.9, which counts 
Dyck paths, is given by 


1-—v1-—4x 1 
D(x) = ————_ 
(x) oe 
Let w, denote the number of nonnegative random walk paths of length 2”. The 
difference between such a path and a Dyck path is that for such a path there is no 
requirement that it return to O at time 2”. We also define wo = 1. We now calculate 
{wy }e2.) by deriving a recursion formula which involves {d,}°2 9. 


Proposition 4.3. The number w, of nonnegative random walk paths of length 2n is 


given by 
2n 
Wr = ,n>l. (4.21) 
n 


Remark. The number of random walk paths of length 27 that return to 0 at time 27 
is also given by (); since to obtain such a path, we must choose n jumps of +1 and 
n jumps of —1. Thus, we have the following somewhat surprising corollary. 


Corollary 4.2. 
P(S; = 0,..., Son = 0) = P(Sr_ = 0). 


Proof of Proposition 4.3. Of course every nonnegative random walk path of length 
2n + 2, when restricted to its first 2n steps, constitutes a nonnegative random walk 
path of length 2n. A nonnegative random walk path of length 2n which does not 
return to O at time 2n, that is, which is not a Dyck path, can be extended in four 
different ways to create a nonnegative random walk path of length 27 + 2. On the 
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other hand, a nonnegative random walk path of length 2n which is a Dyck path can 
only be extended in two different ways to create a nonnegative random walk path of 
length 2n + 2. Thus, we have the relation 


Wnt = 4(Wn — dn) + 2d, = 4wyn — 2d,, n = 0. (4.22) 


Let 


W(x) = >) wy x" 


n=0 


be the generating function for {w,}°2.. As with the power series defining D(x), 
it is clear that the power series defining W(x) converges for |x| < i Multiply 


equation (4.22) by x” and sum over n from 0 to oo. On the left side we obtain 

y ati = +(W(x)— 1), and on the right hand side we obtain 4W(x)—2D(x). 

From the resulting equation, +(W(x) — 1) = 4W(x) — 2D (x), we obtain 
1—2xD(x) 


Substituting for D(x) in (4.23) from Corollary 4.1, we obtain 


1 
W(x) = ———., |x| < -. 
=e a 


We have W(0) = 1, and forn > 1, 


n! n! ni! 2"n! n 


o 1 n 1 2n)! 2 
Wey. (1—4x)72)™|.20 = —2" [Qi -D)= 1 jn Qn) _ ( ") 
moa 
i 


Thus the Taylor series for W(x) is given by 


n=0 


and we conclude that w, = ‘eye = 


Armed with Propositions 4.2 and 4.3, we can give a quick proof of (4.6). 
Proof of Theorem 4.1. By the remark after Proposition 4.3, it follows that (4.6) 


holds for k = n. So we now assume that k € {0,1,...,” — 1}. Given a random 
walk path, {x; i we define the negative of the path to be the path {—x; Fao: 


If a random walk path of length 2n satisfies i = 2k, then its first 2k steps 
constitute a random walk path that returns to 0 at time 2k, and its last 2n — 2k 
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steps constitute either a random walk path that is strictly positive or the negative 
of such a path. As noted in the remark after Proposition 4.3, there are Gy) random 
walk paths of length 2k that return to 0 at time 2k. How many strictly positive 
random walk paths of length 2n — 2k are there? Let {x; a be such a path. Then 


x, = 1, and by parity considerations, x2,-2, => 2. Consider now the part of the 
path from time | to time 2n — 2k. If we relabel and subtract one, yj = xj;41 — 1, 
j =0,1...,2n—2k —1, then we obtain a nonnegative random walk path of length 


2n — 2k — 1. By defining y2,—2% = V2n—2"—-1 + 1, we can extend this path in two 
ways to get a nonnegative random walk path of length 2n — 2k. This reasoning 
shows that there is a two-to-one correspondence between nonnegative random walk 
paths of length 2n — 2k and strictly positive random walk paths of length 2n — 2k. 
We know that there are w,-~ = ‘oare nonnegative random walk paths of length 
2n — 2k; thus, we conclude that the number of strictly positive random walk paths 
of length 2n — 2k is equal to aie We conclude from the above analysis that 
the number of random walk paths of length 27 that satisfy Le = 2k is equal to 


C') 72—74), from which (4.6) follows. 
We now consider (4.7). In Exercise 4.4 the reader is asked to apply Stirling’s 


formula and show that for any € > 0, 


GC) 1d 


nt k(n — ky 


uniformly over en < k < (1—€)n, asn > oo. 


(4.24) 
Using (4.24) and (4.6), we have forO <a <b <1 
(2n) [nb] C) Ce) [nb] 1 1 
P@<—— <b)= » Ak > es = 
= k=|na]+1 2 k=[na]+1 at Vk(n ~ k) 
I [nb] 1 1 
sy —, an —> &. (4.25) 


e k=[na]+1 4/ kd = 9) ie 
But the last term on the right hand side of (4.25) is a Riemann sum for 
1b 1 : i 

= iL T= dx. Thus, letting 2 — oo in (4.25) gives 


(2n) 
lim P(a< 2— <b 


i 7? 1 
<b)= i dx 
noo 2n Ja V¥x(1—x) 


for 0<a<b<l, 


2 2 
= —arcsin Vb — — arcsin Ja, 
Sa 1 


which is equivalent to (4.7). This completes the proof of Theorem 4.1. O 


We now turn to the proof of Theorem 4.2. 
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Proof of Theorem 4.2. We need to prove (4.8). Of course, (4.9) follows from (4.8) 
just like (4.7) followed from (4.6). Recalling the symmetric definition of Ox, for 
the purpose of this proof, we will refer to S2, as “positive” if either Sy, > 0 or 
Sox = O and $2.) > 0. Let c,,, denote the number of random walk paths of length 
2n which are positive at exactly 2k steps. Since there are 27” random walk paths of 
length 2n, in order to prove (4.8), we need to prove that 


o(*) (| .¢203 (4.26) 
Chk = E nak > = U,1,...,N. 


By Proposition 4.3, we have Cy, = Ce), and by symmetry, Cn.0 = C"); thus, (4.26) 
holds for k = 0,n. 

Consider now k € [n — 1]. A random walk path that satisfies Of, = 2k 
must return to 0 before step 2n. Consider the first return to 0. If the path was 
positive before the first return to 0, then the first return to 0 must occur at step 27, for 
some j € [k] (for otherwise, the path would be positive for more than 2k steps). If 
the path was negative before the first return to 0, then the first return to 0 must occur 
at step 27, for some j € [” — k] (for otherwise the path would be positive for fewer 
than 2k steps). In light of these facts, and recalling that v; = d;—1 is the number of 
primitive Dyck paths of length 27, it follows that for 7 € [k], the number of random 
walk paths of length 2” which start out positive, return to O for the first time at step 
27, and are positive for exactly 2k steps is equal to dj—1Cn—;,—;, Similarly, for 
J € [n—k], the number of random walk paths of length 2n which start out negative, 
return to 0 for the first time at step 27, and are positive for exactly 2k steps is equal 
to dj—1Cn—j,. Thus, we obtain the recursion relation 


k n—k 
Cink = dj—1Cn—jk—j + dixie pis k € [n — 1]. (4.27) 
j=l j=l 
Let e, := C, n => 0. As follows from the remark after Proposition 4.3, for 


n = 1, e, is the number of random walk paths of length 2n that are equal to 0 at 
step 2n. We derive a recursion formula for {e,,}°2,. A random walk path of length 
2n which is equal to 0 at step 2n must return to O for the first time at step 2k, for 
some k € [n]. The number of random walk paths of length 2” which are equal to 0 
at time 2 and which return to 0 for the first time at step 2k is equal to 2upe,_, = 
2d,_—1en—~. Consequently, we obtain the recursion formula 


en =D) 2dk—1en—K- (4.28) 
k=1 


We can now prove (4.26) by considering (4.27) and (4.28) and applying induction. 
To prove (4.26) we need to show that for all > 1, 


Cnk = Cken—k, fork =0,1,...,n. (4.29) 
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When 7 = 1, (4.29) clearly holds. We now assume that (4.29) holds for all n < no 
and prove that it also holds forn = no + 1. Whenn = no + 1 andk = 0 or 
k = no + 1, we already know that (4.29) holds. So we need to show that (4.29) 
holds forn = no + 1 andk € [no]. Using (4.27) for the first equality, using the 
inductive assumption for the second equality, and using (4.28) for the third equality, 
we have 


k no+1—k 
Cno tlk = atkins + p> dj—1n9+1-j,k = 
j=l j=l 
k no+1—k 
de pete + pa dj—-1€k€no+1-k-j = 
j=l j=l 
1 1 
7 Okemo tI + FZ omotl-kek = CEng tik (4.30) 


which proves that (4.29) holds for n = no + 1 and completes the proof of 
Theorem 4.2. O 


Exercise 4.1. This exercise completes the proof of Proposition 4.1. We proved that 
with probability one, the simple, symmetric random walk on Z visits 0 infinitely 
often. 


(a) For fixed x € Z, use the fact that with probability one the random walk visits 
0 infinitely often to show that with probability one the random walk visits x 
infinitely often. (Hint: Every time the process returns to 0, it has probability 
(3)F of moving directly to x in |x| steps.) 

(b) Show that with probability one the random walk visits every x € Z infinitely 
often. 


Exercise 4.2. In this exercise, you will prove that ET) = oo, where Tp is the first 
return time to 0. We can consider the random walk starting from any j € Z, rather 
than just from 0. When we start the random walk from 7, denote the corresponding 
probabilities and expectations by P; and E;. Fix n > 1 and consider starting the 
random walk from some j € {0,1,...,”}. Let T,, denote the first nonnegative 
time that the random walk is at 0 orn. 


(a) Define g(j) = E;7p,,. By analyzing what happens on the first step, show that 
g solves the difference equation g(j) = 1+ 5aj +1)+ sai — 1), for 
j =1,...,n—1. Note that one has the boundary conditions g(0) = g(n) = 0. 

(b) Use (a) to show that £7), = j(n — j). (Hint: Write the difference equation 

in the form g(j + I)— (i) = 8) -—e —)-2) 

In particular, (b) gives F,7o,, = n — 1. From this, conclude that ET) = oo. 


wm 


(c 
Exercise 4.3. Prove (4.10): lim, +95 p (2 > €) = 0, forall e > O. (Hint: 


Represent O9, by OF, = ae Iys,=0}, where l¢s,=0} is as in the proof of 
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0 
Ox, 


i = 0. Conclude 


Proposition 4.1. From this representation, show that lim.) E 
from this that (4.10) holds.) 


Exercise 4.4. Use Stirling’s formula to prove (4.24). That is, show that for any 
€,5 > 0, there exists an n,5 such that ifn > n,5, then 


eee ) em 


Ci) 5 Je ® <1 +8 


for all k satisfying en <k < (1—e)n. 


Exercise 4.5. If one considers a simple, symmetric random walk {S, al 9 up to 


time 2n, the probability of seeing any particular one of the 2?” random walk paths 
of length 2n is equal to 2-2”. Recall from the remark after Proposition 4.3 that there 
are (=) random walk paths of length 27 that return to 0 at time 2n. It follows from 
symmetry that conditioned on S>,, = 0, the probability of seeing any particular one 
of the ( random walks paths of length 27 which return to 0 at time 27 is equal 
to ey 
(a) Let p € (0,1) — 3} and consider the simple random walk on Z which jumps 
one unit to the right with probability p and one unit to the left with probability 
1 — p. Denote the random walk by {SP : pag Consider this random walk up 
to time 2n. For each particular random walk path of length 2n, calculate the 
probability of seeing this path. The answer now depends on the path. 


(b 


wm 


Conditioned on eh = 0, show that the probability of seeing any particular one 
of the (9 random walk paths of length 2n which return to 0 at time 27 is equal 
to _— 
Gr) 

Exercise 4.6. Let 0 < j < m. Consider the random walk {SP ) peg as in 
Exercise 4.5, with p € (0,1), but starting from j, and denote probabilities by P;. 
Let fH denote the first nonnegative time that this random walk is at 0 or at m. Use 
the method of Exercise 4.2—analyzing what happens on the first step—to calculate 
Pi(S Mi = 0), that is, the probability that starting from /, the random walk reaches 
0 before it reaches m. (Hint: The calculation in the case p = 5 needs to be treated 
separately.) 


Chapter Notes 


The arcsine law in Theorem 4.2 was first proven by P. Lévy in 1939 in the context 
of Brownian motion, which is a continuous time and continuous path version of the 
simple, symmetric random walk. The proof of Theorem 4.2 is due to K.L. Chung and 
W. Feller. One can find a proof in volume | of Feller’s classic text in probability [19]. 
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One can also find there a proof of Theorem 4.1. Our proofs of these theorems are 
a little different from Feller’s proofs. As expected, the proofs in Feller’s book have 
a probabilistic flavor. We have taken a more combinatorial/counting approach via 
generating functions. Proposition 4.3 and Corollary 4.2 can be derived alternatively 
via the “reflection principle”; see [19]. For a nice little book on random walks from 
the point of view of electrical networks, see Doyle and Snell [15]; for a treatise on 
random walks, see the book by Spitzer [32]. 


Chapter 5 
The Distribution of Cycles in Random 
Permutations 


In this chapter we study the limiting behavior of the total number of cycles and of 
the number of cycles of fixed length in random permutations of [n] as n — oo. This 
class of problems springs from a classical question in probability called the envelope 
matching problem. You have n letters and n addressed envelopes. If you randomly 
place one letter in each envelope, what is the asymptotic probability as n — oo that 
no letter is in its correct envelope? 

Let S,, denote the set of permutations of [n]. Of course, S, is a group, but the 
group structure will not be relevant for our purposes. For us, a permutation 0 € 
S, is simply a 1-1 map of [n] onto [n]. The notation o; will be used to denote 
the image of j € [n] under this map. We have |S,| = n!. Let PY denote the 
uniform probability measure on S,,. That is, Pe (A) = a for any subset A C S,. 
Ifo; = j, then j is called a fixed point for the permutation o. Let D, C Sy 
denote the set of permutations that do not fix any points; that is,o € D, ifo; # j, 
for all 7 € [n]. Such permutations are called derangements. The classical envelope 
matching problem then asks for lim, 95 PY (D,,). 

The standard way to solve the envelope matching problem is by the method of 
inclusion—exclusion. Define G; = {o € S, : 0; = i}. (We suppress the dependence 
of G; on n since n is fixed in this discussion.) Then the complement D¢ of D,, is 
given by D¢ = U?_,G;, and the inclusion—exclusion principle states that 


1 —j=1 


P(U"_,Gi)) =) P(G))— > P(G:NG;)+ 


i=1 l<i<j<n 


SY P(G;NG;N Gg) — ++ + (HI) P(N Gi). 


l<i<j<k<n 


(See Exercise A.2 in Appendix A.) Each of the probabilities above can be computed 
readily. After some calculations one finds that P(D,) = 1— P(U?_,G;) = 1—1+ 
a ~ a nee (-1)"5; thus, limy—+o0 P(Dy) = ee 


R.G. Pinsky, Problems from the Discrete to the Continuous, Universitext, 49 
DOI 10.1007/978-3-319-07965-3__5, © Springer International Publishing Switzerland 2014 
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Here is an elegant, alternative proof using generating functions. Let d he denote 
the number of permutations in S,, that fix exactly k points. We need to calculate 


(n) 
do 


lim; —+o0 ale 


. Clearly, 


n 


oe =a (5.1) 


k=0 


since every permutation fixes k points, for some k. To construct a permutation in 
S,, that fixes exactly k points, first we can choose k numbers from [n] for the fixed 
points, and then we must choose a permutation of the other  — k numbers that fixes 


none of them; thus, 
@) _ {”)\ 7a-* 
wo =("ar 


Substituting this in (5.1) gives 


or equivalently 


n ay} 
pa 0 =], (5.2) 
kin —k)! 


If one multiplies the absolutely convergent power series ‘°°. a,x" by the abso- 
lutely convergent power series °°) b,x”, one gets the absolutely convergent 


power series ee C,xX", where cy, = a ayb,—;. Thus, it follows from (5.2) 
that 
n [o-e) qd [o.e) 
0 n\ __ n 
(Sy Sox) = ot le 
n=0 n=0 n=0 
or 
lee) (n) —x 
d e* 
yx" = 2 ele 1 (5.3) 
n! 1-—x 


(n) 
an” 3 : 
Thus —)_ is the coefficient of x” in 


=¥ 2 3 


SF ee eee ee a 
ae x+5 art Yd txt xe txe4+--), 


(n) 
and this is easily seen to give “- = 1-1 + moat t(-b" 


n!} 


1 


n\* 
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In order to begin our study of the behavior of the number of cycles and of the 
number of cycles of fixed length in random permutations, we recall some basic facts 
and notation concerning cycles of permutations. Consider the permutation 0 € S4 
given in two-line form by (5 77} 4). This means that 0; = 2,02 = 4, etc. Since 
1 goes to 2, 2 goes to 4, 4 goes to 3, and 3 goes back to 1, we call o cyclic and 
denote this by writing 0 = (1 243). (We could also just as well write it as (43 1 2), 
for example.) Recall that every permutation can be decomposed into a product of 
disjoint cycles. For example, consider 0 € Sg given by (4323257 $). Under o, 
1 goes to 3, 3 goes to 5, 5 goes to 6, 6 goes to 7, and 7 goes back to 1, closing a 
cycle. Now 2 goes to 2, which makes a cycle unto itself, and finally, 4 goes to 8 
and 8 goes back to 4. Therefore, we write 0 = (13567)(2)(48) or, alternatively, 
o = (13567)(48); in the latter form, the convention is that every number that does 
not appear at all forms a cycle unto itself. Note that o has one cycle of length 5, one 
cycle of length 2, and one cycle of length 1. 

For o € S, and j € [n], let C - (o) denote the number of cycles of length j in 
o. Note that for all o € S;,, one has the identity 


LIC ost. 


j=l 


We call (C\”(o), CL? (a), ..., Ci” (o)) the cycle type of the permutation o. Let 
N® (a) = > cya) 
j=1 


denote the number of cycles in the permutation o € S,,. Under the probability 
measure P”, we may think of N“) and Ce as random variables. In this chapter 
we will investigate the limiting distribution of the random variable N“) and of 
the random variable C - for fixed 7, as n -—> oo. In fact, more generally, 
we will investigate the limiting distribution of the j-dimensional random vector 
(C Nes ; (Oi sates Cy, We call these cycles small cycles because their lengths are 
fixed asin — oo. 

Instead of just considering permutations under the uniform measure Pe , we will 
consider permutations under a one-parameter family of probability measures which 
includes the uniform measure as a particular case. For each @ € (0, 00), we define a 


probability measure p® on S,, by 


No) 


K, (0) ° 


PO (fo}) = 
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where 


K,(0) — Ss gNC) 


oESy 


is the normalizing constant required to make a probability measure. Thus, under 
the measure PO , every permutation is weighted proportionally by the parameter 
@ raised to an exponent equal to the number of cycles in the permutation. 
Consequently, for 6 > 1, P&? favors permutations with many cycles, and for 6 < 1, 
it favors permutations with few cycles. Of course 8 = 1 corresponds to the uniform 
measure: PY = Pi The original reason for considering the probability measures 
P® can be attributed to Proposition 5.1 below, which gives the exact distribution 
of the cycle types under PS . In Exercise 5.1, the reader is asked to verify that 
Proposition 5.1 follows from the definition of Po along with Proposition 5.2 and 
Lemma 5.1, which are stated and proved in the course of the proofs of Theorems 5.1 
and 5.2 below. We use the standard notation 


6 = O90 +1)---(O+n—-1, n=. 
This expression is sometimes referred to as a rising factorial; the notation is called 


the Pochhammer symbol. 


Proposition 5.1. If )),_, ja; =n, then 


Na” (n) 
Pi (C, = a1,C, =i, (=o ae am TG ae 


The distribution in Proposition 5.1 is known as the Ewens sampling formula; it 
arose originally in the context of population genetics. 

We will prove a weak law of large numbers for the distribution of the number of 
cycles N“), 


Theorem 5.1. Let 6 € (0, 00). Under PO, the distribution of the number of cycles 
N in a permutation satisfies 


N® 
logn 


— 6 in probability; 
that is, for all € > 0, 


N® 
lim P\?(|—— —6|>6) =0. 
noo logn 
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We now consider the small cycles. A random variable Z is distributed according 
to the Poisson distribution with parameter 2 > 0 (Z ~ Pois(A)) if 


i 
BGS) ae for 7 =0,1,.... 


The j discrete random bemoaned {X; Vs are called independent if P(X, = 
X1,...,X; =x) = /_, P(X; = x;), for all choices of {x;}/_, Cc R. In the 
sequel, Z, will denote a random variable distributed according to Pois(A), and it 
will always be assumed that {Z),}/_, are independent for distinct {A;}/_,. 

We will prove a weak convergence result for small cycles. 


Theorem 5.2. Let 6 € (0, 00). Let j be a positive integer. Under the measure PS, 


the distribution of the random vector (C ee co, ee cy converges weakly to the 
distribution of (Ze, Zo, ...,Za). That is, 
J 


9 (4)™ 
lim PX? (C{” = m, Cy” =m,...,C;” =m;) =] [er +. 
noo ae m;! 
m >=0,i=1,...,j. (5.4) 
Remark. Let j be a positive integer and let 1 < ky < ky < +++ < kj. In 


Exercise 5.7 the reader is asked to show that by making a small change in the proof 
of Theorem 5.2, one has 


mi 
: (nm) _ (n) __ -£ za 2 
im Py (C,, = m1,C,,° = m2,...,C, Te i 


i=l 


m =0,i=1,...,/. (5.5) 


In particular, for any fixed j, the distribution of cS converges weakly to the 


Pois(5) distribution. Actually, (5.5) can be deduced directly from (5.4); see 
Exercises 5.2 and 5.3. 


Our proofs of these two theorems will be very combinatorial, through the method 
of generating functions. The use of purely probabilistic reasoning will be rather 
minimal. 

For the proofs of the two theorems, we will need to evaluate the normalizing 
constant K,,(6). Of course, this is trivial in the case of the uniform measure, that 
is, the case 9 = 1. Let s(n, k) denote the number of permutations in S, that have 
exactly k cycles. From the definition of K,,(@), we have 


n 


K, (0) = Y > s(n, kyo. (5.6) 


k=1 
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Proposition 5.2. 


K,(0) = 6. 


Remark. The numbers s(n, k) are called unsigned Stirling numbers of the first kind. 
Proposition 5.2 and (5.6) show that they arise as the coefficients of the polynomials 
gn(0) := 0 = 6(6 + 1)---(0 +n —1). 


Proof. There are (n — 1)! permutations in S,, that contain only one cycle and one 
permutation in S,, that contains n cycles: 


s(n, 1) =(n—-1)!, s(n,n) = 1. (5.7) 
We prove the following recursion relation: 
s(n +1,k) =ns(n,k)+s(n,k—-1),n>2,2<k <n. (5.8) 


Note that (5.7) and (5.8) uniquely determine s(n, k) for alln > 1 and all k € [n]. 

To create a permutation o’ € S,4), we can start with a permutation o € S,, 
and then take the number 7 + 1 and either insert it into one of the existing cycles 
of o or let it stand alone as a cycle of its own. If we insert n + 1 into one of the 
existing cycles, then o’ will have k cycles if and only if o has k cycles. There are n 
possible locations in which one can place the number n + | and preserve the number 
of cycles. (The reader should verify this.) Thus, from each permutation in S, with 
k cycles, we can construct n permutations in S,4 1 with k cycles. If, on the other 
hand, we let n + 1 stand alone in its own cycle, then o’ will have k cycles if and 
only if o has k — 1 cycles. Thus, from each permutation in S,, with k — 1 cycles, we 
can construct one permutation in S,41 with k cycles. Now (5.8) is the mathematical 
expression of this verbal description. 

Let c;,¢ denote the coefficient of 6° in q,(9) = 0(0 + 1)-++(@ +n— 1). Clearly 
Ca = (n—1)! and cy, = 1, forn > 1. Writing gn+1(0) = dn(@)(O +7), one sees 
that Cntig = NCnk + Cnx—-1, forn > 2, 2< k <n. Thus, c,, satisfies the same 
recursion relation (5.8) and the same boundary condition (5.7) as does s(n,k). We 
conclude that c, % = s(n, k). The proposition follows from this along with (5.6). 0 


In light of Proposition 5.2, from now on, we write the probability measure p® 


in the form 
gn) 
() = es 
P, ({o}) — gin) 


We now set the stage to prove Theorem 5.1. The probability generating function 
Px (s) of arandom variable X taking nonnegative integral values is defined by 


lo. e) 
Psi =) Pos), 6) 21. 
i=0 
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The probability generating function uniquely determines the distribution; indeed, 
1 d'Px(s) 


7 ——Is=0 = = 1). Let (n) (S53 enote the probabulit eneratin 
a ae P(X i). Let Py 0) d he probability g ing 


function for the random variable N under P: 
n 
Pyw(s;6) = > 5’ POWN™ =i), 
i=l 


Recalling that s(,i) denotes the number of permutations in S,, with 7 cycles, it 
follows that 


_ 6's(n,i) 


(nz) _ = 
Pre(NwY = i) am 


Using this with (5.6) and Proposition 5.2 gives 


“. ,O's(n,i) — (s6)™ 


ah a: _ sO(s0 +1): (S0+n—-1) | 
Pun) =) sa = “9m = G4) OFn-) ~ 
i=1 
“ 0 i-1 
Nyaa + a1” Om 


A random variable X is distributed according to the Bernoulli distribution with 
parameter p € [0,1] if P(X = 1) = p and P(X = 0) = 1-— p. We write 
X ~ Ber(p). The probability generating function for such a random variable is 
ps + 1— p. Now let {Xog4;-1)-1}7=, be independent random variables, where 
Xooti-y-! ~ Ber(z*_). Let Zn.o = >-}—1 Xo(o4i—1-!. Then the probability 
generating function for Z,,.9 is given by 


Pz, ,(s) = E520 = Fs&i=1%o@4i-y-!) = I Es*oot+i-y-! = 
i=1 
- 6 Pi 
5.10 
Gyaa't oar (5.10) 


For the third equality above we have used the fact that the expected value of a 
product of independent random variables is equal to the product of their expected 
values. From (5.9), (5.10), and the uniqueness of the probability generating function, 
we obtain the following proposition. 


Proposition 5.3. Under P? , the distribution of N“ is equal to the distribution of 

an Xo(g4i—1)-1, where {Xo94i;—-1)-! }/=1 ave independent random variables, and 
0 

Xo+i-1-! ~ Ber (gz75). 
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Remark. As an alternative way of arriving at the result in the proposition, there is 
a nice probabilistic construction of uniformly random permutations (0 = 1) that 
immediately yields the result, and the construction can be amended to cover the 
case of general 6. See Exercise 5.4. 


We now use Proposition 5.3 and Chebyshev’s inequality to prove the first 
theorem. 


Proof of Theorem 5.1. Let Zng = Y-}—, Xo@+i—1-1- By Proposition 5.3, it 
suffices to show that 
lim pan — 6| >) =0, foralle > 0. (5.11) 
noo 


If X, ~ Ber(p), then the expected value of X, is EX, = p, and the variance 


is Var(X,) = p(1 — p). Since the expectation is linear, we have EZ, 9 = 
pan eo: By considering the above sum simultaneously as an upper Riemann 


sum and as a lower Riemann sum of appropriate integrals, we have 


n 1 n a) 
A(1 0) — log @) = 0 dx < ——— = FZ,4< 


n-1 1 
i+of erie 1 + O(log(n — 1 + 8) — log 6). 


Since log(n + 6) = logn + log(1 + 4) and log(n —1+ 6) = logn + log(1 + a 
the above inequality immediately yields 


EZ,.6 = 0 logn + O(1), asn > ow. (5.12) 


Since the variance of a sum of independent random variables is the sum of the 
variances of the random variables, we have Var(Z,,9) = )-;—y on ahr Similar to 
the integral estimate above for the expectation, we have 


n—1 
Var(Zy,.9) <0 a — ~so+0f {Fa GapienGoh, (5.13) 
1 xX 


Using (5.12) for the last inequality below, we have for sufficiently large n 
Zn.6 

logn 

P( (Ze _ EZ) + (EZn6 _ 6 logn)| = €logn) = 

P( Zn.0 — EZn.6| z elogn ~ |EZn 6 a 6 logn)|) < 


P( 


—6| > €) = P(|Zn.6 — 0 logn| => €logn) = 


1 
P(|Zn.o — EZno| = 5< logn). (5.14) 
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Applying Chebyshev’s inequality to the last term in (5.14), it follows from (5.13) 
and (5.14) that for sufficiently large n, 


Zn 6+61 -1 
Mat ajso2 (5.15) 
logn 7@ log’ n 
Now (5.11) follows from (5.15). Oo 


We now develop a framework that will lead to the proof of Theorem 5.2. Given a 
positive integer n and given a collection {a;}/_, of nonnegative integers satisfying 
Yi ia; = Nn, let c,(a1,...,@n) denote the number of permutations o € S,, with 


cycle type (a,...,@,). From the definition of Po we have 


OLi=14%¢,, (a1,...,4n) 


PoC = a4), oe = 42,..-, cy = An) = 


Q@™) 

To prove Theorem 5.2, we need to analyze PHC = mM1,..., (en = mj), for 
large n and fixed 7. We have 

POC? Sm CP Sti, Sm) = 

POC Attica Sy, | Oily teh ay 
aa imp+ Viajqiiai=n 
aj+i1=0,...,dn=0 
gbiaimtLinj+14¢ (my, .2+,Mj,Qj41,... +n) 


| na : (5.16) 
Vie) imi += j41 iaj=n 


aj4120,...,dn=0 


We calculate c,(a1,...,@,) by direct combinatorial reasoning. 
Lemma 5.1. 

n!} 
aia! 


Remark. From the lemma and (5.16), we obtain 


Cn(a1,.--,4n) = 


6 (n) (n) (n) 
POC Sing Sigg SS 
nl ae 


gm) m;! ya I] a;! ; 


i=] Dias im +L jpiagzen =I! 
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The sum on the right hand side above is a real mess; however, a sophisticated 
application of generating functions in conjunction with the lemma will allow us 
to evaluate the right hand side of (5.16) indirectly. 


Proof of Lemma 5.1. First we separate out a; numbers for | cycles, 2a. numbers 
for 2 cycles,..., (2 — 1)a,—, numbers for (” — 1) cycles, and finally the last na, 
numbers for 1 cycles. The number of ways of doing this is 


n\[{n—a,\[{n—a, —2a) n—a,—---—(n— 1)an-1 _ 
ay 2a? 3a3 a NAn 7 


n! 
a1!(2a2)!+++ (nay)! 


The a, numbers selected for | cycles need no further differentiation. The 2a, 
numbers selected for 2 cycles must be separated out into a2 pairs. Of course the 
order of the pairs is irrelevant, so the number of ways of doing this is 


t (2a5\ {2052 4\(2\ (2a)! 
ay!\ 2 2 J \2}\o) > ateane: 


The 3a3 numbers selected for 3 cycles must be separated out into a3 triplets, and then 
each such triplet must be ordered in a cycle. The number of ways of separating the 
3a3 numbers into triplets is 


1 {3az\ [3a3 —3 6\(3\ (as)! 
a3!\ 3 3 J \3}\3)  asl@ne’ 


Each such triplet can be ordered into a cycle in (3—1)! ways. Thus, we conclude that 
the 3a3 numbers can be arranged into a3 3 cycles in ec ae ways. Continuing 
like this, we obtain : 


_ n! (ar)! (8 -INGGas)!_— (1) (nan)! 
ai '(2a9)!+*-(nan)Par\(2N@ a3!) Sag (nS) 


n! 


c(a1,..., an) 


ay!dq!+++ dy !242343 eee Qn . 


Oo 

We now turn to generating functions. Consider an infinite dimensional vector 

x = (X1,%2,...), and for any positive integer n, define x = (x1,...,X_). For 
a = (a,,...,dn), let x* = (x™)4 := xj'+++x%", Let T(a) denote the cycle type 


of o € S,. Define the cycle index of S,,n => 1, by 
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1 1 
bn(x) = OX) = DEAT = DT enla)r*. 
“GEG . Fe] fai =n 
a,>0,,...,dy, =0 


We also define ¢o(x) = 1. We now consider (formally for the moment) the 
generating function for ¢, (0x): 


CONS) OO. = Bim). 


n=0 


Using Lemma 5.1, we can obtain a very nice representation for G), as well as a 
domain on which its defining series converges. Let ||x|]oo := sup,>  |Xn|- 


Proposition 5.4. 


GO(x,t) = exp(0 >> _ for |t| <1, 
i 


i=1 


Proof. Consider t € [0,1) and x with x; > 0 for all j, and ||x||oo < oo. Using 
Lemma 5.1 and the definition of ¢, (x), we have 


GO, t= > y Cn (a)(Ox)*t" 7 


n!} 
n= =0 jai jaj=n 
a,>0,...,a,>0 


» m (Ox1)*! +++ (OXn)™t" 
nN idiq;' ! ~ 
n= =0>y= | jaj=n Mi=i is i 
a0 b oes% dn>=0 
= a. oo Sa as oo ox;ti 
yy TWSe*- y PS-f. - 
n= =0 a1 Jagan i=l a,>0,a2>0,... i=1 i=1 
a,>0,...,4n)=0 
00 : 
x,t! 
exp( ) ) —). (5.17) 
1 


The right hand side above converges for ¢ and x in the range specified at the 
beginning of the proof. Since all of the summands in sight are nonnegative, it follows 
that the series defining G is convergent in this range. For ¢ and x in the range 
specified in the statement of the theorem, the above calculation shows that there is 
absolute convergence and hence convergence. Oo 


We now exploit the formula for G°)(x,t) in Proposition 5.4 in a clever way. 
Recall that 
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log(1 —t) =-y 5 (5.18) 


i=1 


For x = (x1, x2,...) and a positive integer j, let x/*! = (x1,...,x;,1,1,...).In 
other words, x)! is the infinite dimensional vector which coincides with x in its 
first 7 places and has | in all of its other places. From Proposition 5.4 and (5.18) we 
have 


CO Fj J 
+. t! Cc a vey 
()(~sl py 
G(x", t)= exp(0 a =) exp(6 pS ; = ~ exp(6 ; 
6x8) 
We will need the following lemma. 


Lemma 5.2. Let 0 € (0,00). Let °°°, b; be a convergent series, and assume that 


aap Lh 25 hud It} <1. 


i=0 


If @ > 1, also assume that ¥°°°,|bi| < oo. If @ € (0,1), also assume that 
ye 5’ |bi| < 00, for some s > 1. Then 


Ee a yn = Da 


Proof. Since (Gay |, <9 = 0(0+1)---(8@+n—1) = 6™, the Taylor expansion 


1 . : 
for aco? is given by 


t (5.20) 


wie for convenience we have defined 6 = 1. Thus, the Taylor expansion for 
jar 29 bit! is given by 


1 [o.e} ; CO 
aa at = Daw 
i=0 n=0 


(n—i) : . 
where d, = yy S b; as Therefore, by the assumption in the lemma, we have 


n gln— i) 


Yn = 2h ‘@—i! 
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If 6 = 1, then k! = 6), for all k. Consequently the above equation reduces to 
Yn = >-/~o i, and thus the statement of the lemma holds. When 6 ¥ 1, then using 
the additional assumptions on {b;}?2,, we can show that 


Q (ni) 00 


ni 
li = Pe 21 
im gm) dh "(n _ i)! ay: ic) ) 


i=0 


which finishes the proof of the lemma. The reader is guided through a proof of (5.21) 
in Exercise 5.5. O 


We can now give the proof of Theorem 5.2. 


Proof of Theorem 5.2. From (5.19) and the original definition of G (x,t), we have 


anne ore so Giant wm 7 aS bn (Ox Dy", (5.22) 


i=1 n=0 


Considering x and @ as constants, we apply Lemma 5.2 to (5.22). In terms of the 
lemma, we have 


= $, (6x) and sql ae r= oe he a (5.23) 


i=1 


n order to be able to a the lemma for a > QO, we need to show that 
In ord be abl pply the 1 for all @ > 0 d to show th 
ye 5’ |bi| < 00, for some s > 1. Define {b;}%, by 


exp(0 so cue ae 7 -ye biti. (5.24) 


i=1 


Since all of the coefficients in the sum in the exponent on the left hand side of (5.24) 
are nonnegative, we have b; > |b;| > 0, for all i. The reader is asked to prove this 
in Exercise 5.6. The function on the left hand side of (5.24) is real analytic for all 
t € R (and complex analytic for all complex ft); consequently, its power series on 
the right hand side converges for all t ¢ IR. From this and the nonnegativity of b;, it 
follows that pa s'b; < 00, for all s > 0, and then, since |b;| < b;, we conclude 
that }°7°, s'|b;| < co, for all s > 0. 
By definition, from (5.23), we have 


oo i 
yb; = expo >~ =) (5.25) 


i=0 i=1 
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Consider now 


n!} 1 ~ 
OR Fe u(x) = =Goy DL enlay(Ox™)*. (5.26) 
y=] fai =n 


a,>0,...,a,>0 


For any given j-vector (m,,...,m,;) with nonnegative integral entries, the coeffi- 
cient of x/"'x5"? +++ a in (5. 26) i is 
1 yi mitdjaj41 Gi 
ga > O+i=1 t= J Cr(M1,...,M7,Aj41.-++54n). 


wie: imp +d} =j41 iaj=n 
4j4120,...dn20 
But by (5.16), this is exactly PO(C™ = m,c” =m c™ =m,).B 
vy . > yY fn 1 = Ais 5: — Pie Saas | = j): y 


Lemma 5.2, lim, a Y, exists, and this is true for every choice of x and 6; thus, 
we conclude that 


. (ily a gt 
dm. ah = = im aa dy (O% )= = Pmy,...,mj (2)x7" xj ; 
m,>0,...,mj =0 
(5.27) 
where 


Pm..mj(8) = Yim PA? (C{” = my,Cz” = ma,...,C; = mj). 6.28) 
oo 


Applying Lemma 5.2, we conclude from (5.25) and (5.27) that 


> Pm... m;(6)X1" ie (5.29) 
i=l m>0,...,mj =0 
On the one hand, (5.29) shows that the coefficient of xj"'--- au in the Taylor 


expansion about x = 0 of the function exp(0 a ut) IS Dmy...., mj (0). On the 
other hand, by Taylor’s formula, this coefficient is equal to 


1 gmite “+m j (exp(0 yy a )) 
m,!+++m;! ae 


= Aaa oy! Lyd yt = ite 4G 2 oe (5.30) 


i=1 i=l i=1 


lx=0 = 


5 Cycles in Random Permutations 63 


Thus, from (5.28)—(5.30), we conclude that 


A 9 (2)™ 
lim POC” =m, ch? =m,..., cP =mj)=| |e i, 
noo =I mM; ! 
completing the proof of Theorem 5.2. O 


Exercise 5.1. Verify that Proposition 5.1 follows from the definition of PO? along 
with Proposition 5.2 and Lemma 5.1. 


Exercise 5.2. Show that (5.4) is equivalent to 


J Ayr; 
i: ety et (n) (n) pa 8G)" 
jim, Pn (C, <m,C, =mp,..., Ci <mj)= > | le “aon 


O<r)<m,...,0Srj<mj i=1 


m >0,i=1,..., ne (5.31) 


Exercise 5.3. In this exercise you will show directly that (5.5) follows from (5.4). 


(a) Fix an integer 7 > 2. Use (5.31) to show that for any € > 0, there exists an N. 
such that ifn > N, andm > N,, then 


PC” > m, for some i € [j]) <e. (5.32) 


(b) From (5.31) and (5.32), deduce that (5.31) also holds if some of the m; are equal 
to oo. 
(c) Prove that (5.5) follows from (5.4). 


Exercise 5.4. This exercise gives an alternative probabilistic proof of Proposi- 
tion 5.3. A uniformly random (6 = 1) permutation o € S, can be constructed in 
the following manner via its cycles. We begin with the number |. Now we randomly 
choose a number from [n]. If we chose j, then we declare that 0, = j. This is the 
first stage of the construction. If 7 4 1, then we randomly choose a number from 
[n] — {/}. If we chose k, then we declare that 0; = k. This is the second stage of 
the construction. If k #~ 1, then we randomly choose a number from [n] — {/, k}. 
We continue like this until we finally choose 1, which closes the cycle. For example, 
if after k we chose 1, then the permutation o would contain the cycle (1jk). Once 
we close a cycle, we begin again, starting with the smallest number that has not yet 
been used. We continue like this for ” stages, at which point the permutation o has 
been defined completely. 


(a) The above construction has 1 stages. Show that the probability of completing a 


cycle on the jth stage is = AG . Thus, letting 
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xy _ 1, if acycle was completed at stage /; 
3 0, otherwise, 
: (n) 1 
it follows that X;" ~ Ber(=4455): 
(b) Argue that (x7 are independent. 
(c) Show that the number of cycles N”) can be represented as N“”) = i yy : 
thereby proving Proposition 5.3 in the case 6 = 1. 
(d) Let 6 € (0,00). Amend the above construction as follows. At any stage /, 
close the cycle with probability ee and choose any other particular number 


1 
n+6-j 


yields a permutation distributed according to PO , and use the above reasoning 
to prove Proposition 5.3 for all 6 > 0. 


that has not yet been used with probability . Show that this construction 


Exercise 5.5. (a) Show that if }°?°, |b;| < oo and the triangular array {c,; :i = 
0,1,...,2; n = 0,1,...} is bounded and satisfies limy—oo Cn.n—i = 1, for all 
i, then lim,—+oo )7/=1 DiCn.n-i = YF2, b;. Then use this to prove (5.21) in the 
case that 0 > 1. 


g(n—i) n\ (0) 


(b) Show that if 8 € (0,1), then wn aa = ojiitt = 7, Also, waa = 
where we recall, 9 = 1. 

(c) Show that if }°?°, |bi|s’ < oo, where s > 1, then |b;| < s~‘, for all large i. 

(d) For @ € (0,1), prove (5.21) as follows. Break the sum dey yg Oi ee into 


three parts—from i = 0 toi = N, fromi = N +1 toi = [5], and from 
i = [5] + 1 toi = n. Use the reasoning in the proof of (a) to show that by 
choosing N sufficiently large, the limit as n — oo of the first part can be made 
arbitrarily close to }(?2., b;. Use the fact that }7?°9 |b;| < oo to show that by 
choosing N sufficiently large, the lim sup,,_,,, of the second part can be made 
arbitrarily small. Use (b) and (c) to show that the limit as n — oo of the third 
part is 0. 


Exercise 5.6. Prove that b; > |b;|, where {b;}?2, and fare are defined in (5.23) 
and (5.24). 


Exercise 5.7. Make a small change in the proof of Theorem 5.2 to show that (5.5) 
holds. 


Exercise 5.8. Consider the uniform probability measure Pi” on S, and let EM 
denote the expectation under Pp ) Let X n = X,(o) be the random variable denoting 
the number of nearest neighbor pairs in the permutation o € S,,, and let Y,, = Y,(o) 
be the random variable denoting the number of nearest neighbor triples ino € S),. 
(A nearest neighbor pair for o is a pair k,k + 1, with k € [n — 1], such that oj = k 
and oj4; = k + 1, for somei ¢€ [nm — 1], and a nearest neighbor triple is a triple 
(k,k+1,k+2) withk € [n—2] such that o; = k,oj;4; = kK+1 andoj42 =k +2, 
for some i € [n — 2].) 
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(a) Show that Ex, = 1, for all n. (Hint: Represent X,, as the sum of indicator 

random variables {J,}'=, where I,(o) is equal to 1 if k,k + 1 is a nearest 
neighbor pair in o and is equal to 0 otherwise.) It can be shown that the 
distribution of X,, converges weakly to the Pois(1) distribution as n — oo; 
see [17]. 

(b) Show that limy+oo EY, = 0 and conclude that limy+o. P) (Y, = 0) = 1. 


Chapter Notes 


In this chapter we investigated the limiting distribution as n —> oo of the 
random vector denoting the number of cycles of lengths 1 through 7 in a random 
permutation from S,. It is very interesting and more challenging to investigate 
the limiting distribution of the random vector denoting the 7 longest cycles or, 
alternatively, the 7 shortest cycles. For everything you want to know about cycles in 
random permutations, and lots of references, see the book by Arratia et al. [6]. Our 
approach in this chapter was almost completely combinatorial, through the use of 
generating functions. Such methods are used occasionally in [6], but the emphasis is 
on more sophisticated probabilistic analysis. Our method is similar to the generating 
function approach of Wilf in [34], which deals only with the case 6 = 1. For 
an expository account of the intertwining of combinatorial objects with stochastic 
processes, see the lecture notes of Pitman [30]. 


Chapter 6 
Chebyshev’s Theorem on the Asymptotic 
Density of the Primes 


Let z(n) denote the number of primes that are no larger than 7; that is, 


m(n) = sS 1, 


psn 


where here and elsewhere in this chapter and the next two, the letter p in a 
summation denotes a prime. Euclid proved that there are infinitely many primes: 
lim,—+oo 1(n) = oo. The asymptotic density of the primes is 0; that is, 


The prime number theorem gives the leading order asymptotic behavior of (7). It 
states that 


. a(n)logn 
lim ——— 


noo n 


=1. 


This landmark result was proved in 1896 independently by J. Hadamard and by C.J. 
de la Vallée Poussin. Their proofs used contour integration and Cauchy’s theorem 
from analytic function theory. A so-called “elementary” proof, that is, a proof that 
does not use analytic function theory, was given by P. Erdos and A. Selberg in 1949. 
Although their proof uses only elementary methods, it is certainly more involved 
than the proofs of Hadamard and de la Vallée Poussin. We will not prove the prime 
number theorem in this book. In this chapter we prove a precursor of the prime 
number theorem, due to Chebyshev in 1850. Chebyshev was the first to prove that 
zt(n) grows on the order ib ; Chebyshev’s methods were ingenious but entirely 
elementary. Given the truly elementary nature of his approach, it is quite impressive 
how close his result is to the prime number theorem. Here is Chebyshev’s result. 
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Theorem 6.1 (Chebyshev). 


m(n) logn . a(n) logn 
———— < limsup ————_. 


noo n 


0.693 = log2 < lim inf < log4 = 1.386. 
noo 


Chebyshev’s result is not the type of result we are emphasizing in this book, since 
it is not an exact asymptotic result but rather only an estimate. We have included the 
result because we will need it to prove Mertens’ theorems in Chap. 7, and one of 
Mertens’ theorems will be used to prove the Hardy—Ramanujan theorem in Chap. 8. 

Define Chebyshev’s 0-function by 


A(n) = > log p. (6.1) 


psn 


Chebyshev realized that an understanding of the asymptotic behavior of 6(7) allows 
one to infer the asymptotic behavior of m() (and vice versa), and that the direct 
asymptotic analysis of the function 6 is much more tractable than that of the function 
zt, because the sum of logarithms is the logarithm of the product. Indeed, note that 


A(n) = log] | p. (6.2) 


psn 


We will give an exceedingly simple proof of the following result, which links the 
asymptotic behavior of 6 to that of z. 


Proposition 6.1. 
(i) liminf, +59 %@ = lim inf, 0 


(n 
(ii) lim sup, 550 “= lim sup, +50 


a(n) logn , 


Proof. We have the trivial inequality 


O(n) = Y= log p < a(n) logn. 


psn 
Dividing this by n and letting n — oo, we obtain 


oe cr, ie noo n noo n 


(6.3) 
We have for € € (0, 1), 


G(n)= D> log p = (a(n) — x((n')) logn'* = 


[n!~€]<p<n 


a- )(x(n) logn — [n'~‘]log n), 
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where the last inequality comes from the trivial fact that z(y) < y. Dividing this by 
n and letting n — oo, and using the fact that « € (0, 1) is arbitrary, we obtain 


7 1 7 1 
imine Sinn iti ie. ea 

Nn—->oo n noo n n—>oo n n—>oo n 
The proposition follows from (6.3) and (6.4). Oo 


The following theorem gives an upper bound on the Chebyshev 6-function. 
Theorem 6.2. aa <log4, n= 1. 


Proof. The proof is by induction, the inductive hypothesis being that O(n) < n log 4. 
Note that the hypothesis holds for n = 1,2. If + 1 > 3 is even, then @(n + 1) = 
O(n) < nlog4 < (n + 1) log 4, where the first inequality comes from the inductive 
hypothesis. If n + 1 is odd, then write n + 1 = 2m + 1, and note that the 


binomial coefficient a) a Cine Wenn (nt?) is divisible by every prime between 


m + 2 and 2m + | (since all such primes appear in the numerator of the latter 
expression, but not in the denominator). Since a) is a positive integer (all 
binomial coefficients are integers) which contains as factors all the primes between 
m+ 2 and 2m + 1, we have 


2m +1 
IT p=( . ) (6.5) 


m+2<p<2m+1 


By the binomial formula, 


2m+1 
ami 4 pnt ‘ 2m+ 1 . 2m+1 a: 2m + 1 = 2m + 1 ; 
= j ~\ om m+1 m | 


thus, 


det 
( a <2". (6.6) 


From (6.2), (6.5), and (6.6) we have 


A(2m+1)—O(m+1)=log |] = p<log2”"=mlog4. (6.7) 
m+2<p<2m+1 


From (6.7) and the inductive hypothesis, we have 
0(2m + 1) < (m+ 1) + mlog4 < (m+ 1)log4+ mlog4 = (2m + 1) log 4; 


that is, 0(n + 1) < (n+ 1) log4. Oo 
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As we noted above, the direct asymptotic analysis of 6 is much more tractable 
than that of 2, and Theorem 6.2 carried out an upper bound analysis for @. It turns 
out that for the lower-bound analysis it is better to work with Chebyshev’s w- 
function instead of Chebyshev’s 6-function. One defines 


wian)= 2 logp. (6.8) 


pk<n,k>1 


That is, in the sum above, a term log p appears for every prime p and integer k > 1 
for which p* < n. So, for example, (14) = 3log2 + 2log3 + log5 + log7 + 
log 11 + log 13. Of course, y(n) > O(n). We show now that @ and w have the same 
asymptotic behavior. 


Proposition 6.2. 
(i) liminf,+oo @ = lim inf, X; 
(ii) lim sup, 599 a = lim sup; 66 uo. 
Proof. Since y(n) > 6(n), we have 
6 7 
lim inf om) < lim inf Late lim sup lio) < lim sup = (6.9) 
noo n noo n n—>oo n noo 


logn 


Since 2" < n if and only if k log2 < logn, or equivalently, k < las sas it follows 


that p* > n for every prime p and every k > [222]; thus 


log 2 
(ioe) 
win)-0(n)= SY) logp= >> >> logp= 
pk <n,k>2 k=2 Per: 
[ioe] 
s 6([n*]) <e 6([n?}). (6.10) 


Now trivially, 6(k) = pene log p < k logk. Using this with (6.10) gives 


1 22 
y(n) — O(n) < dete on ‘ (6.11) 
log 2 
From (6.11) it follows that 
0 0 
lim inf —— ©) > lim inf m ue lim sup —— wy > lim sup —— = (6.12) 
n>oo Nn noo n noo n noo 


The proposition follows from (6.9) and (6.12). Oo 
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Remark. The bound obtained in (6.11) can be improved by replacing the trivial 
bound on 0, namely, 0(k) < k logk, by the bound obtained from Theorem 6.2. 


We will carry out a lower-bound analysis of w. This will be somewhat more 
involved than the upper bound analysis for @ but still entirely elementary. For n ¢ N 
and p a prime, let v,(m) denote the largest exponent & such that p*|n. One calls 
vp(n) the p-adic value of n. It follows from the definition of v, that any positive 
integer n can be written as 


a= | [p, (6.13) 
P 


In Exercise 6.1 the reader is asked to prove the following simple formula: 
vp(mn) = vp(m) + v,(n), m,n EN. (6.14) 


From (6.14) it follows that 


n 


vp(n!) = D> v(m). (6.15) 


m=1 


We will need the following result. 


Proposition 6.3. 


Proof. We can write 
vp(m) = » 1. 
1<k<oo, pk |m 


Using this with (6.15), we have 


n 


wOd=> vaj=>> YS le) 1. (6.16) 


m=1 m=1 1<k<oo, p*|m k=1 1<m<n,p*|m 


If p* > n, then obviously there is no m € [n] for which p*|m. If p* < n, 


then the integers m € [n] for which p*|m are the [rr] integers p*,..., pie. 
Thus, > inn, peim 1 = [Fe]. Substituting this in (6.16) completes the proof of the 
proposition. Oo 


We can now carry out a lower-bound analysis of w. 
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Theorem 6.3. 


imine 2 
n—>oo n 


> log 2. 


Proof. Consider the binomial coefficient (- )= Gn)" Using (6.13) we have 


(n!)2° 


2n (2n)! v n)'!)—2vy(n v N)i)—2V p(n 
("") ~ Ge Lae er = [eee (6.17) 
: P 


px2n 


where the final equality comes from the fact that neither (27)! nor n! has a prime 
factor larger than 2n. From Proposition 6.3, we have 


foe) 


2 
vp((2n)!) — 2vp(n!) =~ (Fa - 2-7): (6.18) 


Of course, [=] = | = Oif p* > 2n, that is, if k > [=]. Thus, 
in the summation over k above, we may replace the upper limit co by [2]. 
Furthermore, it is easy to verify that [2x] — 2[x] is equal to either 0 or 1, for all 


real numbers x. From these two facts we obtain from (6.18) the estimate 


log 2n 
aca 


0 < v,((2n)!) — 2v,(n!) < a 


(6.19) 


From (6.17) and (6.19) we have the estimate 


2n ps2" | 2n 
<[[> Tog p | (6.20) 


pS2n 


On the other hand we have the easy estimate 


an g2n 
ee (6.21) 


To prove (6.21), note that the middle binomial coefficient (-) maximizes ( ) over 


k € [2n]. The reader is asked to prove this in Exercise 6.2. Thus, we have 


gen _ (1 3h 1 = 3 2n — 2+ 2n < 2+ (Qn— 1) on <2n an 
= k = k] 7 n})— n]}- 
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From (6.20) and (6.21), we conclude that 


Q2n log 2n 
oy p! log p ] 
2n 

px2n 


or, equivalently, 


log 2n 
| 


2n log 2 —log2n < a ieee 


pe2n 


log p. (6.22) 


Recalling from (6.8) that y(2n) = )° pk <2n,k>1 10g p, it follows that the summand 
log p appears in y(2n) one time for each k > 1 that satisfies p* < 2n; that is, the 
summand log p appears (‘ss = =] times. Thus, the right hand side of (6.22) is equal to 
w(2n), giving the inequality 


w(2n) = 2n log 2 — log 2n. (6.23) 
Of course then we also have 
w(2n + 1) = 2n log2 — log 2n. (6.24) 


Dividing (6.23) by 2n and dividing (6.24) by 2n + 1, and letting n — oo, we 
conclude that 


limint 2 
n—>0o n 


> log2, 


which completes the proof of the theorem. Oo 
We can now prove Chebyshev’s theorem in one line. 


Proof of Theorem 6.1. The upper bound follows from Theorem 6.2 and part (ii) 
of Proposition 6.1, while the lower bound follows from Theorem 6.3, part (i) of 
Proposition 6.2, and part (i) of Proposition 6.1. Oo 


Exercise 6.1. Prove (6.14): u,(mn) = v,(m) + v,(n), m,n EN. 
Exercise 6.2. Prove that (") = MaXcepn} a 


Exercise 6.3. Bertrand’s postulate states that for each positive integer n, there 
exists a prime in the interval (n,2n). This result was first proven by Chebyshev. 
Use the upper and lower bounds obtained in this chapter for Chebyshev’s 0-function 
to prove the following weak form of Bertrand’s postulate: For every € > 0, there 
exists an No(€) such that for every n > no(e) there exists a prime in the interval 
(n, (2+ €)n). 
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Chapter Notes 


Chebyshev also proved that if lim,_95 Hnvlos exists, then this limit must be equal 


to 1. For a proof, see Tenenbaums’ book [33]. Late in his life, in a letter, Gauss 
recollected that in the early 1790s, when he was 15 or 16, he conjectured the prime 
number theorem; however, he never published the conjecture. The theorem was 
conjectured by Dirichlet in 1838. For some references for further reading, see the 
notes at the end of Chap. 8. 


Chapter 7 
Mertens’ Theorems on the Asymptotic 
Behavior of the Primes 


Given a sequence of positive numbers {a,}P2, satisfying limy—+o0d, = Oo, one 


way to measure the rate at which the sequence approaches oo is to consider the rate 

at which the series i + grows. For a; = j, it is well known that the harmonic 
J 

series Y=) - satisfies )_ : = logn+O(1) asn — oo. How does the harmonic 

series of the primes behave? The goal of this chapter is to prove a theorem known 

as Mertens’ second theorem. 


Theorem 7.1. 


1 
2 — = loglogn + O(1), asn > ~w. 


psn 


Mertens’ second theorem will play a key role in the proof of the Hardy— 
Ramanujan theorem in Chap. 8. For our proof of Mertens’ second theorem, we will 
need a result known as Mertens’ first theorem. 


Theorem 7.2. 


] 
>. SEE i logn + O(1), asn > oo. 


psn 


We now prove Mertens’ two theorems. 


Proof of Mertens’ first theorem. We will analyze the asymptotic behavior of log! 
in two different ways. Comparing the two results will prove the theorem. First we 
show that 


logn! = nlogn + O(n), asn > oo. (7.1) 
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We note that (7.1) follows from Stirling’s formula: n! ~ n"e~" ./27n. However, we 
certainly don’t need such a precise estimate of n! to obtain (7.1). We give a quick 
direct proof of (7.1). Consider an integer m > 2 and x € [m— 1, m]. Integrating the 
inequality log(m — 1) < logx < logm over x € [m — 1, m] gives 


log(m—1) < / log x dx < logm, 
m—1 
which we rewrite as 
0<logm -{ log x dx < logm — log(m — 1). 
m-1 


Summing this inequality from m = 2 tom = n, and noting that the resulting series 
on the right hand side is telescopic, we obtain 


0 <logn!~ | log x dx < logn. (7.2) 
1 


An integration by parts shows that / He log x dx = nlogn —n + 1. Substituting this 
in (7.2) gives 


nlogn—n+1 < logn! <nlogn—n+ 1+ logn, 


which completes the proof of (7.1). 

To analyze logn! in another way, we utilize the function v,(m) introduced in 
Chap. 6. Recall that v,(), the p-adic value of n, is equal to the largest exponent 
k such that p*|n and that by the definition of v p we haven = [] * pm = 


II nek p’?), for any integer m that is greater than or equal to the largest prime 
divisor of n. Recall that Proposition 6.3 states that 


=F 
v,(n!) = [—]. 
P 2 pk 


Thus, we have 


n= |] ep? =T] porn tel, 


psn psn 


and 


logn! =)" ( 15 ) log p = a Jlog p + D°( (Sole) })logp. (7.3) 
psn k=2 


psn k=1 pan P 
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We now analyze the two terms on the right hand of (7.3), beginning with the 
second term. We have 


Thus, we obtain 


sy ae rl) logp < ny —=-_ log P Cn, (7.4) 


psn k=2 psn pip 1) ~ 7 


for some constant C > 0, the latter inequality following from the fact that 


pee sen > < yee ES < oo. We write the first term on the right hand 


side of (7.3) as 


a a ee oer DG [- Dos p. (7.5) 


ee psn psn 


Recalling that Theorem 6.2 gives 0(n) < (log 4)n, we can estimate the second term 
on the right hand side of (7.5) by 


0) C= }) log p < Slog p = O(n) < (log 4)n. (7.6) 
psn psn 
From (7.3)-(7.6), we conclude that 


1 
logn! =n > wee + O(n), asn > oo. (7.7) 


psn 


Comparing (7.1) with (7.7) allows us to conclude that }~ we = logn + O(\), 


psn 
completing the proof of Mertens’ first theorem. Oo 


In order to use Mertens’ first theorem to prove his second theorem, we need to 
introduce Abel summation, a tool that is used extensively in number theory. Abel 
summation is a discrete version of integration by parts. It appears in a variety of 
guises, the following of which is the most suitable in the present context. 


Proposition 7.1 (Abel Summation). Let jo,n € Z with jo <n. Leta: [jon] 
Z —> R, and let A : [jo,n] — R be defined by A(t) = SIL, ak). Let f : 
Lio, |] — R be continuously differentiable. Then 


Yd. alr) f(r) = An) f) = AG) fo) — i AM fidt. (78) 


Jo<rsn Jo 
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Remark. Since A(jo) = a(jo), we could also write the above formula in the more 
compact form 


Y ansoy=aesey- [ awsoar. 79) 


joxr<n Jo 


The form in the proposition of course mimics the standard integration by parts 
formula. 


Proof. Since A is constant between integers, we have 


n n=1 r+1 n-1 
[aoroa=¥ [ awsoa=¥ anf + - so). 
j raj?! =i 
(7.10) 


Substituting for A in the last term on the right hand side, and interchanging the order 
of the resulting summation, we obtain 


n-1 n—1 r 


AMF + D=- £0) = VO (Yaw) Fe +) - f@) = 


r=jo rT=jo k=jo 


n—-1 n—-1 n—-1 


Yo alk) So (f+ Y= f0)) = Yo a&)(f() - FQ) = 


k= jo r=k k=jo 


n—-1 


A(n—1)f(n)— > alk) f(&). (7.11) 


k=jo 
From (7.10) and (7.11) we obtain 


n—1 


[ aos oar =aa-yfa- Yaw soo. 


Je k=jo 


Substituting this in the right hand side of (7.8) gives 


A(n) f(n) — AGo) f Go) — i A(t) f(t) dt = 
J 
n-1 n 
A(n) f(n) — A(jo) f(jo) — An — Df) + D0 alk) fk) = SY) alk) fk). 
k=jo k=jo+l1 
; ; (7.12) 


which proves the proposition. Oo 
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Proof of Mertens’ second theorem. Let 


log p 
a(n)=4 ? 
0, otherwise, 


, ifn =p; 


and let 


1 
t)= —,t>1. 
fO oe 


We use Abel summation in the form (7.9) with j9 = 2. By Mertens’ first theorem, 
we have 


ty 
| 
A(t) = Dv alk) = ¥* 22? = togt + O(1), ast > 00. (7.13) 
b=? pit 


Thus, we obtain from (7.9) and (7.13), 


ys = 1 = > a(r) fir) = ain) fon) = | A(t) f’(t) dt = 


psn psn log p 2<r<n 
1 o( "logt + OU 
sen O i ete ois 
logn 2  t(logt)? 
We have 
n 1 
dt = loglogt|> = loglogn — log log 2, 
2 tlogt 
and since Hise t= tl we have 
a 1 
i ——.dt<o 
> t(logt)? 
Using these two facts in (7.14) gives 
1 
> — = loglogn + O(1), asn > ~w, (7.15) 


psn 


completing the proof of Mertens’ second theorem. Oo 
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Exercise 7.1. (a) Use Mertens’ first theorem and Abel summation to prove that 


log? 1 
Ss aoe — log? n + O(logn). 
Dp 2 


psn 


(Hint: Write 7, <, soe P= i<-<, a(r) logr, where a(r) is as in the proof of 


Mertens’ second theorem.) 
(b) Use induction and the result in (a) to prove that 


log* 1 
> pe — log n + O(log! n), 
Pp k 


psn 


for all positive integers k. 


Exercise 7.2. Proposition 6.1 in Chap.6 showed that the two statements, 
aes log p ~ n and x(n) = pe 1~ aa can easily be derived one from the 
other. The prime number theorem cannot be derived from Mertens’ second theorem. 
Derive Mertens’ second theorem in the form }> eh . ~ loglogn from the prime 
“_ (Hint: Use Abel summation.) 


logn 


number theorem, z(n) ~ 


Chapter Notes 


The two theorems in this chapter were proven by F. Mertens in 1874. For some 
references for further reading, see the notes at the end of Chap. 8. 


Chapter 8 
The Hardy—Ramanujan Theorem 
on the Number of Distinct Prime Divisors 


Let w(n) denote the number of distinct prime divisors of 1; that is, 


a(n) = > 1. 


p\n 


Thus, for example, w(1) = 0, w(2) = 1, w(9) = 1, w(60) = 3. The values 
of w(n) obviously fluctuate wildly as n — on, since w(p) = 1, for every 
prime p. However, there are not very many prime numbers, in the sense that 
the asymptotic density of the primes is 0. In this chapter we prove the Hardy— 
Ramanujan theorem, which in colloquial language states that “almost every” integer 
n has “approximately” loglogn distinct prime divisors. The meaning of “almost 
every” is that the asymptotic density of those integers n for which the number of 
distinct prime divisors is not “approximately” loglogn is zero. The meaning of 
“approximately” is that the actual number of distinct prime divisors of 7 falls in 
the interval [log logn — (log log ny2t, loglogn + (log log n)2*], where 6 > 0 is 
arbitrarily small. 


Theorem 8.1 (Hardy—Ramanujan). For every 6 > 0, 


jim WE LNT: lo) — log logn| < (log logn)?**}| 
1m — 


Noo N 


1. (8.1) 


Remark. From the proof of the theorem, it is very easy to infer that the statement of 
the theorem is equivalent to the following statement: For every 6 > 0, 


a4 [{n € [N] : |o(n) — log log N| < (log log N)2*4} _ 


li 1. 
Noo N 
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While the statement of the theorem is probably more aesthetically pleasing than 
this latter statement, the latter statement is more practical. Thus, for example, take 
6 = .1. Then for sufficiently large n, a very high percentage of the positive integers 
up to the astronomical number N = e®” will have between n—n°° and n+n° distinct 
prime factors. Let n = 10°. We leave it to the interested reader to estimate the 
O(1) terms appearing in the proofs of Mertens’ theorems, and to keep track of how 


they appear in the proof of the Hardy—Ramanujan theorem below, and to conclude 


9 
that over ninety percent of the positive integers up to N = e* have between 


10° — (10°)° and 10° + (107)°* distinct prime factors. That is, over ninety percent 


9 
of the positive integers up to e*' have between 10° — 251, 188 and 10? + 251, 188 
distinct prime factors. 


Our proof of the Hardy—Ramanujan theorem will have a probabilistic flavor. For 
any positive integer NV, let Py denote the uniform probability measure on [NV]; that 
is, Py({j}) = + for 7 € [N]. Then we may think of the distinct prime divisor 
function @ = w(n) as a random variable on the space [N] with the probability 
measure Py. For the sequel, note that when we write Py (w € A), where A C [N], 
what we mean is 


tn € [VN]: o(@) € Ad] 


Py(@ € A) = Py({n € [N] : w(n) € A}) = W 


Let Ey denote the expected value with respect to the measure Py. The expected 
value of w is given by 


l N 
Evo=y 2 o(n). (8.2) 


The second moment of w is given by 
tw 
Ey wo = rs » w?(n). (8.3) 


The variance Vary (w) of w is defined by 
Vary (w) = Ey(@ — Ey w) = Ey w* — (Ey o)’. (8.4) 


We will prove the Hardy—Ramanujan theorem by applying Chebyshev’s inequal- 
ity to the random variable w: 


Vary () 


Py(jo— Ey w| >A) < OO forA > 0. (8.5) 
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In order to implement this, we need to calculate Eyw and Vary (@) or, equivalently, 
En w and Ey @?. The next two theorems give the asymptotic behavior as N > oo 
of Ey w and of Ey w’. The proofs of these two theorems will use Mertens’ second 
theorem. 


Theorem 8.2. 
Ey o = loglog N + O(1), as N > w. 


Remark. Recall the definition of the average order of an arithmetic function, given 
in the remark following the number-theoretic proof of Theorem 2.1. Theorem 8.2 
shows that the average order of w, the function counting the number of distinct 
prime divisors, is given by the function log logn. 


Proof. From the definition of the divisor function we have 


a yyy yey 


n=1 pln PSN p\nn<N p<N 


ee eal (—). (8.6) 


DSN P PsN P 


The second term above satisfies the inequality 


0< py ae Yi l=a2(N) <N. (8.7) 


psN DSN 
(We could use Chebyshev’s theorem (Theorem 6.1) to get the better bound oc. paw ) 


on the right hand side above, but that wouldn’t improve the order of the final bound 
we obtain for Ey.) Mertens’ second theorem (Theorem 7.1) gives 


1 
y > — = loglog N + O(1), as N > oo. (8.8) 
DSN 


From (8.6)—(8.8), we obtain 


Y > a(n) = N loglog N + O(N), asN > ~&, 


and dividing this by N gives 


Ey w = loglog N + O(1), as N > 0, (8.9) 


completing the proof of the theorem. Oo 
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Theorem 8.3. 

Ey w* = (loglog N)? + O(loglog N), as N > oo. 
Remark. To prove the Hardy—Ramanujan theorem, we only need the upper bound 


Ey w* < (loglog N)? + O(loglog N), as N + ov. (8.10) 


Proof. We have 


P= I=OLVO LD = YO 14.01 = YO lt+om). (8.1) 


pin pin pala Pip2|n pin Pip2|n 
PiFP2 P\FP2 


Thus, 


N N N 
Yoon) => YO 14+ 35 om). (8.12) 


n=1 n=1 py, p2|n n=1 
PiFP2 


The second term on the right hand side of (8.12) can be estimated by Theorem 8.2, 
giving 
N 


Yon) = NEyvw = N loglogN + O(N), as N > ow. (8.13) 


n=1 


To estimate the first term on the right hand side of (8.12), we write 


a N 
ie ei 1 
n=l pi pa|n Pip2<N n<N Pipr2<N 
PiFP2 PiFp2 pip2|n PIF P2 
1 N N 
a> - » (—-[—)). (8.14) 
Pip2SN Pip2 Pip2sN Pip2 Pip2 
PIF P2 PIF P2 


The number of ordered pairs of distinct primes (p1, p2) such that pj p2 < N is of 
course equal to twice the number of such unordered pairs { p1, p2}. The fundamental 
theorem of arithmetic states that each integer has a unique factorization into primes; 
thus, if pip2 = p3pa, then necessarily {p1, po} = {p3, ps}. Consequently the 
number of unordered pairs {p1, p2} such that pj p2 < N is certainly no greater 


than N. Thus, the second term on the right hand side of (8.14) satisfies 


22 > (— af al l)< Y. 1<2N. (8.15) 


Pip2=N P1P2 PiP2 Pip2N 
P\FP2 P\FP2 
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Using Mertens’ second theorem for the second inequality below, we bound from 
above the summation in the first term on the right hand side of (8.14) by 


1 1 
> <()- —) < (loglog N + O(1))”, as N > ov. (8.16) 
Pip2=N Pipa psN P 

P\FP2 


From (8.12)—(8.16), we conclude that (8.10) holds. 

To complete the proof of the theorem, we need to show (8.10) with the reverse 
inequality. The easiest way to do this is to note simply that the variance is a 
nonnegative quantity. Thus, 


Eyw? > (Eyo)? = (loglog N + O(1))’ = (log log N)? + O(log log N), 


where the first equality follows from Theorem 8.2. For an alternative proof, see 
Exercise 8.1. O 


We now use Chebyshev’s inequality along with the estimates in Theorems 8.2 
and 8.3 to prove the Hardy—Ramanujan theorem. 


Proof of Theorem 8.1. From Theorems 8.2 and 8.3 we have 


Vary (@)=Ewn ow’ —(Eyw)’=(log log N)?+ O(log log N)—(log log N + O(1)) = 
O(loglog N), as N > co. (8.17) 


Theorem 8.2 gives 
Ey w = loglog N + Ry, where Ry is bounded as N — oo. (8.18) 


Applying Chebyshev’s inequality with A = (log log NV) 245 where 6 > 0, we obtain 
from (8.5), (8.17), and (8.18) 


1 O(log log NV) 
P —loglog N — Ry| > (loglog N)2+*) < —~= = _—__ as N : 
v (lo og log n| = (log log NV) ey rarrare. as N — oo 
Thus, 
lim Py (lo — log log N - Ry| < (log log ny) =1. (8.19) 
N—->oo 


Translating (8.19) back to the notation in the statement of the theorem, we have for 
every 6 > 0 


i [{n € [N] : |o(n) — loglog N — Ry| < (log log N)2*4}| _ 


lim, = 1. (8.20) 
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The main difference between (8.20) and the statement of the Hardy—Ramanujan 
theorem is that loglog N appears in (8.20) and loglogn appears in (8.1). Because 
log log x is such a slowly varying function, this difference is not very significant. 
The remainder of the proof consists of showing that if (8.20) holds for all 6 > 0, 
then (8.1) also holds for all 5 > 0. 

Fix an arbitrary 6 > 0. Using the fact that (8.20) holds with 6 replaced by 5 we 
will show that (8.1) holds for 6. This will then complete the proof of the theorem. 
The term Ry in (8.20) may vary with N, but it is bounded in absolute value, say 
by M. For N? <n < N, we have 


log log N — loglogn < loglog N —loglog N2 = log2. (8.21) 


Therefore, writing w(n) — loglogn = (a(n) — loglog N — Ry) + (loglog N — 
loglogn) + Ry, the triangle inequality and (8.21) give 
|w(n)—log logn| < |w(n)—log log N—Ry|+log24+M, for N2 <n<N. (8.22) 
Using (8.20) with 6 replaced by a along with (8.22) and the fact that 
1 
limy—+o0 x = 0, we have 
i l{n € [N]: |w(7) — log logn| < (log log N)2+28 + log2 + M}| 
1m = 
Noo N 


1. 
(8.23) 


By (8.21), it follows that (log logn)?*+5 > (loglog N—log2)?+5, forN2 <n <N. 
Clearly, we have 


(log log N — log 2) 2+6 > (log log N)2+28 + log2 + M, for sufficiently large N. 
Thus, 


(log logn)2+? > (log log Ny2+28 +log2+ M, for N2 <n < WN and sufficiently large N. 


(8.24) 
ib: 
From (8.23), (8.24), and the fact that limy— S = 0, we conclude that 
i [{n € [N] : |w(n) — log logn| < (log logn)2+*}| 
im -) 
N->oo N 
O 


Exercise 8.1. Prove the lower bound 


Eva? > (loglog N)* + O(log log N) 
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by using (8.12)-(8.15) and an inequality that begins with >°p,p.<w 5 = 
P\FP2 


1 

De pupasa/N Pip2” 

PIF P2 
Exercise 8.2. Let Q(7) denote the number of prime divisors of n, counted with 

eis ‘ ; 2 2 4 ‘ m ki 
repetitions. Thus, if the prime factorization of n is given by n = [];_, p;', then 
w(n) = m, but Q(n) = >°7_, k;. Use the method of proof in Theorem 8.2 to prove 
that 


N 
1 
EyQ= a Y> Qn) = loglog N + O(1), as N > oo. 


n=1 


Exercise 8.3. Let d(n) denote the number of divisors of n. Thus, d(12) = 6 
because the divisors of 12 are 1,2,3,4,6,12. Show that 


1 n 
7 y¢d(j) = logn + O(1). 
j=l 


This shows that the average order of the divisor function is the function log n. Recall 
from the remark after Theorem 8.2 that the average order of w(n), the function 
counting the number of distinct prime divisors, is the function log logn. (Hint: We 


have d(k) = Dk 1, so eS d(k) = Veet] ae l= Diehl DP beninlk 1.) 


Chapter Notes 


The theorem of G. H. Hardy and S. Ramanujan was proved in 1917. The proof we 
give is along the lines of the 1934 proof of P. Turan, which is much simpler than the 
original proof. For more on multiplicative number theory and primes, the subject 
of the material in Chaps. 6—8, the reader is referred to Nathanson’s book [27] and 
to the more advanced treatment of Tenenbaum in [33]. In [27] one can find a proof 
of the prime number theorem by “elementary” methods. For very accessible books 
on analytic number theory and a proof of the prime number theorem using analytic 
function theory, see, for example, Apostol’s book [5] or Jameson’s book [25]. For 
a somewhat more advanced treatment, see the book of Montgomery and Vaughan 
[26]. One can also find a proof of the prime number theorem using analytic function 
theory, as well as a whole trove of sophisticated material, in [33]. 


Chapter 9 

The Largest Clique in a Random Graph 
and Applications to Tampering Detection 
and Ramsey Theory 


9.1 Graphs and Random Graphs: Basic Definitions 


A finite graph G is a pair (V, E), where V is a finite set of vertices and E is a 
subset of V), the set of unordered pairs of elements of V. The elements of E 
are called edges. (This is what graph theorists call a simple graph. That is, there 
are no loops—edges connecting a vertex to itself—and there are no multiple edges, 
more than one edge connecting the same pair of vertices.) If x, y € V and the pair 
{x, y} € E, then we say that an edge joins the vertices x and y; otherwise, we say 
that there is no edge joining x and y. If |V| =n, then |V| = (5) — sn(n—l). The 
size of the graph is the number of vertices it contains, that is, |V|. We will identify 
the vertex set V of a graph of size n with [n]. The graph G = (V, E) with |V| =n 
and E = V®) is called the complete graph of size n and is henceforth denoted 
by K,,. This graph has 1 vertices and an edge connects every one of the in(n -—1) 
pairs of vertices. See Fig. 9.1. 

For a graph G = (V, E) of size n, a clique of size k € [n] is acomplete subgraph 
K of G of size k; that is, K = (Vk, Ex), where Vx C V,|V| = k and Ex = V{. 
See Fig. 9.2. 

Consider the vertex set V = [n]. Now construct the edge set E C [n]® in the 
following random fashion. Let p € (0, 1). For each pair {x, y} € [n]™, toss a coin 
with probability p of heads and 1— p of tails. If heads occurs, include the pair {x, y} 
in £, and if tails occurs, do not include it in E. Do this independently for every 
pair {x, y} € [n]™. Denote the resulting random edge set by E,,(p). The resulting 
random graph is sometimes called an Erdés—Rényi graph, it will be denoted by 
G,(p) = ({n], En(p)). In this chapter, the generic notation P for probability and E 
for expectation will be used throughout. 

To get a feeling for how many edges one expects to see in the random graph, 
attach to each of the N := in (n — 1) potential edges a random variable which is 
equal to 1 if the edge exists in the random set of edges E,,(p) and is equal to 0 
if the edge does not exist in E,,(p). Denote these random variables by {W,,} 


m=\° 
The random variables are distributed according to the Bernoulli distribution with 
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Fig. 9.1 The complete graph with 5 vertices, G = Ks 


4 


Fig. 9.2. A graph with 10 vertices and 13 edges. The largest clique is the one of size 4, formed by 
the vertices {4,5, 6, 7} 


parameter p; that is, P(W,, = 1) = 1— P(W,, = 0) = p. Thus, the expectation 
and the variance of W,, are given by EW,, = p ando?(W,,) = p(1—p). Let Sy = 
~~ W,, denote the number of edges in the random graph. By the linearity of the 
expectation, one has ESy = Np. Because edges have been selected independently, 
the random variables { W,,, a are independent. Thus, the variance of Sy is the sum 
of the variances of (Wala that is, o7(Sy) = Np(1— p). Therefore, Chebyshev’s 
inequality gives 


€ N 1— 
P(\Sw — Np| > N'S) < NPC). 


Consequently, for any € > 0, one has limy—+o9 P(|Sy — Np| = nN‘) = 0. Thus, 
for any € > 0 and large n (depending on e€), with high probability the Erd6s—Rényi 
graph G,(p) will have np + O(n'**) edges. 

The main question we address in this chapter is this: how large is the largest 
complete subgraph, that is, the largest clique, in G;,(p), asn — oo? We study this 
question in Sect. 9.2. In Sect. 9.3 we apply the results of Sect. 9.2 to a problem in 
tampering detection. In Sect. 9.4, we discuss Ramsey theory for cliques in graphs 
and use random graphs to give a bound on the size of a fundamental deterministic 
quantity. 
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9.2 The Size of the Largest Clique in a Random Graph 


Let L,,,» be the random variable denoting the size of the largest clique in G,,(p). Let 
log? n := log: login. 
D P P 


Theorem 9.1. Let Ly.) denote the size of the largest clique in the Erdés—Rényi 
graph G,,(p). Then 


0, ifc <2; 
lim Pla > 2login — clog? n) = ne 


noo P P 1, ifce >2. 


Remark. Despite the increasing randomness and disorder in G,(p) as n grows, 

the theorem shows that L,,, behaves almost deterministically—with probability 

approaching 1 as n — om, the size of the largest clique will be very close to 

2login — 2log?? n. In fact, it is known that for each n, there exists a value d, 
? 


such that Hiri P(L,, equals either d,, or d, + 1) = 1. That is, with probability 
approaching | as n — oo, Ly, is restricted to two specific values. The proof of this 
is similar to the proof of Theorem 9.1 but a little more delicate; see [9]. We have 
chosen the formulation in Theorem 9.1 in particular because it is natural for the 
topic discussed in Sect. 9.3. 


Let N,,»(k) be the random variable denoting the number of cliques of size 
k in the random graph G,,(p). We will always assume tacitly that the argu- 
ment of N,,p is a positive integer. Of course it follows oo Theorem 9.1 that 


limy— oo P(Nnp(kn) = 9) = 1, if kn = 2log1 n — c logy n, for some c < 2. 


We say then that the random variable N,, ay converges ‘in probability to 0 as 
n — oo. The proof of Theorem 9.1 will actually show that if k, < 2logi n— 


Cc log? n, for some c > 2, then limy—+oo P(Nn,p(kn) > M) = 1, for any M € R. 


Pp 
We say then that the random variable N,,,)(Kn) converges in probability to oo as 
n — oo. We record this as a corollary. 


Corollary 9.1. 
i. fk, > 2login—c log”? n, for some c < 2, then Nn.p(kn) converges to 0 in 
P ? 


probability; that is, 


lim P(Nn,p(Rn) =0)=1; 
noo 


ii. Ifk, < 2login—c log? n, for some c > 2, then Ny, p(kn) converges to oo in 
4 1 


probability; that is, 


lim P(Ny,)(kn) > M) = 1, forall M ER. 
n—->oo 
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mere of Theorem 9.1. The number of cliques of size k,, in the complete graph K, 

is (;); denote these cliques by {K7 : j = 1,. Be aie Let Ix» be the indicator 
random variable defined to be Saul to 1 or 0, according to whether the clique K7; is 
or is not contained in the random graph G,,(p). Then we can represent the random 
variable N;,,)(kn), denoting the number of cliques of size k, in the random graph 
Gy (p), as 


‘) 
Nn p(kn) =) Ter. (9.1) 
aa 


Let P(K;) denote the probability that the clique K? is contained in G,(p); that is, 
the probability that the edges of the clique Kj are all contained in the random edge 


set E,(p) of G,(p). Since each clique K’ contains (@) edges, we have 
P(K") = p@), 


The expected value EJ kn of I kn is given by EI Kr = P(K;;). Thus, the expected 
value of N,,»(kn) is given by 


kn) 
ENn,p(Kn) = > EIxs = ti pO), (9.2) 


j=l 


We will first prove that if c < 2, then 


lim P(Ln,p 2 2logi n—clogi n) = 0. (9.3) 


noo 


We have 
ENn,p(Kn) = P(Nn.p (kn) = 1) = P(Ln, pa >k n)s 


where the equality follows from the fact that a clique of size / contains sub-cliques 
of size j for all 7 € [/ — 1]. Thus, to prove (9.3) it suffices to prove that 


lim EN, ,p(2log1 n — cy log’? n) = 0, (9.4) 
noo Pp ? 
where 0 < c, < c < 2, for all n. (We have written c, instead of c in (9.4) because 


we need the argument of N,,,, to be an integer.) This approach to proving (9.3) is 
known as the first moment method. 
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To prove (9.4), we need the following lemma. 


Lemma 9.1. [fk, = o(n?), as n — oo, then 


n nkn 
~ —, asn > oo. 
Kn k,! 


Proof. We have (;. ) = eee Thus, to prove the lemma we need to show 
that 
. n(n—1)---(n—k, + 1) 
lim : =1, 
noo nn 
or, equivalently, 
kal j 
tim, Y> log(1 — “)=0. (9.5) 
j=l 
Letting f(x) = —log(1 —~x), and applying Taylor’s remainder theorem in the form 


F(x) = fF) + f’(x*(x))x, for x > 0, where x*(x) € (0, x), we have 


1 
0 < —log(l—x) <2x,0<x< 5" 
Thus, for n sufficiently large so that be < 5, we have 
ky-l ky—-l 
. J . J (kn = Dkn 
0<- log(1— —) <2 —_ = ———_., 
a dX a a ~ dX n n 


Letting n — oo in the above equation, and using the assumption that k, = o(n2), 
we obtain (9.5). Oo 


We can now prove (9.4). Let k, = 2logi n — cy log? n, where 0 < cy, <c < 2, 
P P 


for all n. Stirling’s formula gives 


Kknl~ kin en V2rky, avn > oo. 


Using this with Lemma 9.1 and (9.2), we have 


kn kn kn(kn—1) 
n k n kn(kn—1) nN 2 
EN, (k,) = ght) a ge pope ialte coer asn > © 
at ky k,! kkn e-kn 20k 
n n 
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and thus 
log: ENy,.p(kn) = logs 6 Jo wT 
? ?\k, 


1 1 1 
Ky login — 5kn + 5k — ky, logi ky + ky logi e — 5 logs 2nkyn, aSn > OO. 
? P P D 


(9.6) 
Note that 
Cn log? n 
log: ky, = log: (2logi n — cy log? n) = log. (og: n)(2— ——*—)) = 
P P P 7 P P login 
P 
Ch log? n 
log? n+ logi (2 —-—__+_)= log? n+ O(1), asn > ov. (9.7) 
a P login rr 
Pp 
Substituting for k, and using (9.7), we have 
lo Q) 
k, login — =k; —k, logs ky, = (2log1 n —c, log,’ n) logs n— 
? 2 p P 7 ? 
1 
=(2logi n—c, log? n) — (2logi n — Cy log? n) (log n+0O(1))= 
2 p ? p > ? 
(Cyn — 2)(logi 1) log? n+ O(log n). (9.8) 
P 7 P 


Since skp +k, log: e— 5 log: 27k, = O(log. n), it follows from (9.6), (9.8), and 
P P P 
the fact that 0 < c, <c < 2 that 


lim logs ENp,p(2 login — cy log?) n) = -O. 
noo P P D 


Thus, (9.4) holds, completing the proof of (9.3). 
We now prove that if c > 2, then 


lim P(Ly,p = 2login — clog? n) al i (9.9) 
pP P 


noo 
The analysis in the above paragraph shows that if c, > c > 2, for all n, then 


lim EN,,»(2login — cy log? n) =o. (9.10) 
noo P Dp 


The first moment method used above exploits the fact that (9.4) implies (9.3). 
Now (9.10) does not imply (9.9). To prove (9.9), we employ the second moment 


9.2 The Size of the Largest Clique 95 


method. (This method was also used in Chap. 3 and Chap. 8.) The variance of 
Nn,p (Kn) is given by 


2 
Var(Ny,p(Kn)) = E(Nnp(kn) es EN,p(kn)) = EN? (kn) — (ENn,pKn)) 
(9.11) 


Our goal now is to show that if k, = 2logi n — cy log”? n with c, > c > 2, for all 
P P 


n, then 
Var(Nn.p(kn)) = 0((ENn.r(kn))’) as n > 00. (9.12) 


Chebyshev’s inequality gives for any € > 0 


P(|Nnp(kn) — ENy plky)| = €|ENg plky))) <= Mee) .13) 
€?(EN,, p(kn)) 
Thus, (9.12) and (9.13) yield 
jim, PU <e¢)=1, foralle > 0. (9.14) 
From (9.14) and (9.10), it follows that 
im, P(Nip(kn) > M) = 1, forall M ER. (9.15) 


In particular then, (9.9) follows from (9.15). Thus, the proof of the theorem will be 
complete when we prove (9.12), or, in light of (9.11), when we prove that 


EN} (kn) = (ENn.p(kn))” + 0((ENn p(Kn)) ), a8 2 > 00. (9.16) 


We relabel the cliques {K” : j = 1,...,(j))}, of size ky in K, according to 
the vertices that are contained in each clique. Thus, we write Kj, ;,_;, to denote 
the clique whose vertices are ij, i2,..., iz, . The representation for Ny ,p(kn) in (9.1) 


becomes 


Nn.p (kn) = > Tx 


LiQentkey | 


(9.17) 


1 Si) <i2<-+<ix,, <n 


Note that the random variable / Fay 


are all contained in G,,(p) and is equal to 0 


I KM ty lty is equal to 1 if the edges of the 


4 n 
two cliques Koad 


: Llaselkn 
otherwise. Thus, 


= P(K;, OR a 


11 LD y e005: Lk. tn 
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where P(K? U kK; j,,) 18 the probability that the edges of K;; and 


11 ,12,..-51kp Lol2se0 Tlasikky 


Ki b.. Ig, are all contained in the random edge set E,,(p) of G,(p). Consequently, 
we have 


EN, Ga) = > EL xn 


Tyen 
i112 sik, Ki 


Qevedken 
1Si <i2<++<igy Sn 


1<l, <h <o<lkp <n 


a PUKE sins ccsity U Kj, lasses; i) (9.18) 


1 Si) <i2<-<ix, <n 
1</; <lp<-<lky, <n 


Now by symmetry considerations, it follows that the sum 


> P(K; i, err ikn U Kip Ferre lin) 


I<], <ln<--<Ik, <n 


over all k,-tuples 1 < 1) < ly) < +++ < J, <n is independent of the particular 
choice of k,-tuple i1,i2,...,i,,. (The reader should verify this.) For convenience, 
we select the k,,-tuple 1,2,...,k,. Since there are ( i ) different k,-tuples, we have 


1 
n 


n nN n 
EN, (kn) = ( > PUR en U Ki ty, dey)” (9.19) 


us I<] <lp<++<Ij,, <n 


Let 
J =J(hih,...le,) = [Rn] O fh be, ... de, }| 


denote the number of vertices shared by the cliques Ky, and K7 ) la" Each 


124 
of these two cliques has (©) edges. Since the cliques share J vertices, the number 
of edges in Ki, ,, U Ki.) i8 equal to 2(%) — (2), if J = 2, and is equal to 


2(;,), if J = 0 or J = 1. Thus, 


pCD-G@), if J = Ih, h,...,lk,) = 2: 


' (9.20) 
pr), if J = Ih b,..-slm) <1. 


P(Ki, ar kn U Eo diy) = 


Substituting (9.20) into (9.19), we have 


2 _ [a (kn) (4 n a¢kn 
EN? (ly) = ( ST pd-O 4 t ee) 
ie 1<]j <Ip<-<Ik, <n - I<], <Ip<++<Ik, <n 


Jl, lky 22 I (Ula. slhey <1 
(9.21) 
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Keep in mind that our aim is to prove (9.16). We will do this by showing that 
the first term on the right hand side of (9.21) is equal to o((ENn.p (kn))) and 


that the second term on the right hand side of (9.21) is equal to (E Nn, p(kn))” 
2 
0((ENn.p(kn))’). 


In order to analyze the two terms on the right hand side of (9.21), we need to 
count the number of k,,-tuples /),/,...,/,, for which J(,b,...,/,,) = j, for 
j =0,1,...,k,. Denote this number by #(/). In order that J(,,/,...,/k,) = J, 
we need to choose j of the vertices of /),/2,...,/,, from the set [k,,] and the other 
k, — j vertices of 1), /5,...,/,, from the set [n] — [k,,]. Thus, 


#(j) = (‘")(,-") J =0,1,...,kn- (9.22) 


We first show that the second term on the right hand side of (9.21) is equal to 
(ENyp(kn)) + 0((ENn,p(kn))’). Using (9.22), we have 


n a(n) _[ 2 kn\ [n—ky k,\ [n—k 2%) _ 
() Ea EM) OY) 


Ji lo saaey Tky )<1 


n n—ky n—k, kn 7) ese, + kn ((-) | 
(Ce) 6G) foe = (er gy 


(9.23) 


where (9.2) was used for the final equality. By Lemma 9.1, C ) ~ iu and applying 


; = ke ykn kn ' 
pena 9.1 with n replaced by n—k,,, we have (" u")~ b a = —(1- #e ykn 
— nkn 
a, since ky, = 0(n2). Of course then also () ~e ae Thus, 
va n—ky nkn nkn—1 
Ce) tke) Re + An eaiy ki 
kn : kn-1 Kn} - (kn—V) =1+ an (9.24) 
) a n 
Kn Kn! 


From (9.23) and (9.24), we conclude that the second term on the right hand side 
of (9.21) is equal to (ENny,p(Kn))” + 0((ENn p(n)” 

Now we consider the first term on the right hand side of (9.21). Of course, 
CS) < — and (‘") < Ma Also, by Lemma 9.1, (c) ~ a Using these 
estimates and (9.22), and recalling from (9.2) that (ENy.p(kn)) = ()° p03), 
we can estimate the first term on the right hand side of (9.21) by 
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kn 
n 2(")-@) _ (" kn) (n—kn\ oy 
Cae ae coach ee 


IU Loe dhey P22 


(EN p(k)? So ERD 5-8) « (EN, ey)? So 9 
n,p\’n - (7) n,p\Kn = (ky — J) Ga — DUG) ) 
kn knifed fe | kn 
2 n nk = it i) 
(EN, p(kn)) > = pune @) < (EN, p\hn 
j=2 j=2 


(9.25) 

By Stirling’s formula, j! ~ j/e~/./27j, as j — ov, and thus there exists a 
constant C > 0 such that 

jl>= Cjle/, forall j > 2. (9.26) 


It is easy to check that j p? > is decreasing in j for j sufficiently large. Using this 
bho 


and the fact that lim joo j p? = 0, it follows that 
min j po? a =k, p “2, for sufficiently large k,. (9.27) 
25 j<kn 


Using (9.26) for the first inequality below and (9.27) for the second inequality below, 
for sufficiently large n the summation in the last term on the right hand side of (9.25) 
can be estimated by 


n 


=e pi sty (Ss tr)! =>-{ ee 


ris 
joe J j=2 jn j=2 Np ? 


lw fa/peknyi 1 #7 Jpekn 
= fit pe NE 1, (9.28) 
pa oe ) C 1p» np t 


Using the fact that k, = 2login — cy log? in with Cyn = C > 2, we now show 
pP P 


that 


Kn 
lim p, = /pe lim B=: (9.29) 


noo noo o 
np 2 


Using (9.7) (which of course holds for {c,}°2, as above) for the second equality 
below, we have 
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kn Kn kn 
logi ie = logs kn — login — > logs p = log? n + O(1) — logan + ie = 
Pp 


np 2 


Ch Ch 
log? n+ O(1) -—login-+ login— 5 losin =(1- 5) lost” n+ O(\), 
? P P ? ? 


asn 7 Oo. 


Since cy => c > 2, it follows from this that lim,—oo logi es = —oo and 
P np2 

consequently that (9.29) holds. From (9.25), (9.28), and (9.29) we conclude that the 

first term on the right hand of (9.21) is indeed o((E Nn, p(kn))’) . This completes the 


proof of Theorem 9.1. 


9.3 Detecting Tampering in a Random Graph 


The tampering detection problem we discuss is intimately related to Theorem 9.1 
and Corollary 9.1. Consider the random graph G,,(p) = (|n], En(p)). Of course, 
E,(p) C [n]® is a random subset of [n]. Consider now the complete graph K;, 


whose edge set is [n]®. Let k, satisfy 1 < k, <n. There are (;) different cliques 


of size k, in K,. We choose one of these (;') cliques at random and “add” all of 


its edges to the random edge set E,,(p) (of course some of these additional edges 
might already be in E,,(p)); that is, we take the union of F,,(p) and the edges of the 
randomly chosen clique. We denote this new augmented edge set by anh (p) and 
denote the corresponding tampered graph by G®™*n (p). See Fig. 9.3. 

The question we ask is whether one can detect the tampering asymptotically as 
n — oo. Of course, we need to define what we mean by detecting the tampering. 
For this we need to define a distance between measures. 

Consider a finite set Q and consider probability measures jz and v on Q. We 
define the total variation distance between jz and v by 


Drv (we, v) = max |4(A) — v(A)]. (9.30) 
ACQ 


4 


Fig. 9.3. The graph from Fig. 9.2 of size n = 10 has been tampered with by adding to it the clique 
of size k, = 3 formed by the vertices {3,6,10} 
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In Exercise 9.1, the reader is asked to show that the distance Dry(j, v) can be 
written in two other fashions: 


1 
Dry(p, v) = max(u(A) — v(A)) = 5 lee) —vO@). 3) 


xEQ 


It is easy to see that Dyy(fz, v) takes on values in [0, 1], vanishes if and only if 
jt = v, and equals | if and only if jz and v are mutually singular. We recall that 
two probability measures jz and v are called mutually singular if there exists a subset 
A C Q such that (A) = v(Q—A) = | (and then of course ~(Q—A) = v(A) = 0). 

Consider now a {2-valued random variable X (defined on some probability space 
(S, P)). The random variable X induces a probability measure zy on Q, namely 
for any subset A C Q, we define zy(A) = P(X € A). This probability measure is 
called the distribution of X. Given two random variables X, Y taking values in Q, 
we define the total variation distance between them by 


Dry (X,Y) := Drv(tx, Ly). 


We now apply the above concepts to the random graph. The original random 
graph G,,(p) has as its edge set E,(p), whereas the tampered random graph 
G'™kn (p) has the augmented edge set E“™*(p). Each of the random variables 


E,(p) and E®™»(p) takes values in the space P([n]®) := 2, the set of all 
subsets of [n]). (Given a set A, the set of all subsets of A is sometimes denoted by 
24; it is known as the power set of A.) We define the tamper detection problem as 
follows. 


Definition. 


i. If 
Jim, Dry(En (p), pone (p)) = 0, 


we say that the tampering is strongly undetectable. 
ii. If 


lim Drv(En(p), E"*"(p)) = 1, 


we say that the tampering is detectable. 
ii. If 


lim inf Dry(E,(p), E&™"(p)) > 0 and limsup Dry(E,(p), E&™*"(p)) < 1, 
n—>Oo noo 


we say that the tampering is weakly undetectable. 
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We will prove the following theorem. 


Theorem 9.2. Consider the Erdés—Rényi graph G,(p) with random edge set 
E,(p) and consider the tampered graph G'""*» (p) obtained by choosing at random 
a clique of size ky, from the complete graph K,, and adjoining its edges to E,(p) to 
create the augmented edge set E'@"™*n (p), 


i. Ifk, => 2login—c log? n, for some c < 2, then the tampering is detectable; 
P P 


that is, liMy—+o0 Dry(En(p), El”*(p)) = 1. 
iit. Ifk, < 2login—c log? n), for some c > 2, then the tampering is strongly 
7 1 


P 
undetectable; that is, litty+o Dry(En(p), eens (p)) = 0. 


Remark. In light of aes 9.1, Theorem 9.2 seems quite intuitive. Indeed, if 
ky = 2 logs n—- ¢ logy’ n, with c < 2, then N,,»(k,), the number of cliques of 


size k, in the random ‘graph G,,(p), converges to 0 in probability. However, by 
construction, the tampered graph will always have such a clique. Thus, clearly, one 
can distinguish aia the corresponding measures. On the other hand if k, < 
2 logs 1n—C logi’ n, with c > 2, then Ny,p(kn) converges to oo in probability. That 


is, far arbitrary M, the number of cliques of size k, in G,(p) will be larger than 
M with probability approaching | as n — oo. Since the tampered graph G/?™ kn (p) 
is obtained from the original graph G,,(p) by adjoining a randomly chosen clique 
of size k, from the complete graph K,,, and since the number of cliques of size 
ky, in G,(p) grows unboundedly as n — oo with probability approaching 1, it 
seems intuitive that the addition of a single randomly chosen clique would hardly 
be felt, and that asymptotically, the two graphs would be indistinguishable. Despite 
the above intuition, which leads to the correct answer in the present situation, there 
are situations in which this intuition leads one astray. See the notes at the end of 
the chapter. 


Proof. For notational clarity at a certain point in the proof, we will denote Nn,p(kn), 
the random variable denoting the number of cliques of size k, in the random graph 
G,(p), by NOY . For the proof of part (ii) of the theorem, we will need the weak 
law of large numbers for the random variable Ne: 


If k, <2login —clog® n, for some c > 2, then for all € > 0, 
P DP 


kn) (9.32) 
Jim, PO cay <= 1 


This result was actually proved in the course of the proof of Theorem 9.1—it appears 
as (9.14). 

Let 2, denote the distribution of the random variable E,,(p) and let {2y:tam denote 
the distribution of the random variable E®™(p). Let {K Pod = Ase )} 
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denote the ) cliques of size k, in the complete graph K,,. Recall that P([n]®) 


denotes the set of subsets of [n]°; thus, a point @ € P({n]®) is a subset of 
[n], while a subset A C P({n]”) is a collection of subsets of [7]. Denote by 
Alc P({n]™) the subset of P({n]) consisting of all those subsets of [7]? which 
contain all of the (‘ %') edges of the clique K;. Let A” = ue = A% C P([n]™) denote 
the set of all those subsets of [n]® which possess at jeast one clique of size ky. 
The tampered graph is obtained by choosing at random one of the (2) cliques of 
size k, in K, and adding all of its edges to the original random edge set E,,(p). That 
is, one of the K j if Srliskes (7) is chosen at random, and its edges are adjoined to 
E,(p) to form E&™ (p), Of course then, by construction, the tampered edge set 
Eek (p) must possess a clique of size k,,; thus, 


Famke (p) © A". (9.33) 


We first prove part (i) of the theorem. Let k, > 2 logs n—c logy’ n, for some 


c < 2. By Corollary 9.1 (or Theorem 9.1), the probability of there being at least one 
clique of size k, in E,,(p) converges to 0 as n — oo; thus, 


lim Ln (A") = 0. 
noo 
On the other hand, by (9.33), 
L-n:tam(A”) = 1, for all n. 


Consequently, 


Dry (En(p), Eek (p)) = Drv(Un, Mnstam) = |}tn(A) — Lnstam(A)| = 
ACPI) 


| [tn (A") ~~ Henstam(A")| =1- [yn (A") = 1, asin +> OO, 


proving part (i). 

We now prove part (ii). The conditional [1n-probability that a set A C P({n]) 
occurs given that the set A’ . P({n]™) occurs is denoted by jun (A|A%) and is 
Ln (nas 

Ln cry 
tampered graph in the first paragraph of this section, along with the fact that under 
[Ln the existence of any particular edge is independent of the existence of any other 
particular edges, it follows that 


given by Ly, (A|A") = = . From the description of the construction of the 


id 
Yo n(AlA"), for A C P({n]). (9.34) 


Oe 


(The reader should verify this.) 


En;tam (A) = 
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For a point w € P({n]), we write {w} C P({[n]) to denote the subset of 
P({n]™) consisting of the singleton w. Note that 


Hn({o} 0 AY) = bn ({w}), if € A”; 
0, otherwise. 


Consequently, from the definition of nie and the definition of {A a 
1,...,({1)}, we have 


i‘) 
Yo Hn o}1AT) = pn {O)NEP (0), © € P([n]). (9.35) 


j=l 


Note that 1,(4") = p(), for all j. Recall from (9.2) that Es‘) = (j") p@). 
Using these facts with (9.34) and (9.35), we have 


ih) (7) 
i = 1 <i bao} n A) 
Lnstam({@}) SS TA. Hn (@}|A") = Th n i= 
Le hOnAD = Gy ca) 
7 Nu) Nik) 
be ra (@) _ eu ne: (9.36) 
kn = n,p 


Equation (9.36) shows that the probability measure [y;tam is the tilted probability 


measure Of [Ly, tilted by the random variable NY ). 
For € > 0, let 


N&) a) 
B" = {w € P({n]) : pees) =f <<), 
ENz,p 


Since k, < 2login—c log? n, for some c > 2, it follows from the law of large 
; 1 


numbers in (9.32) that 


lim p,(B?) = 1. (9.37) 
n-*FOO 
From (9.36), we have 
(kn) 
n n (me (w) 
|nstam( BZ) — bn(B2)| = | y Ln ({o})( ae —1)| < 
EN, 
weBr n,p 
€ >> pn ({o}) = € bn (B?) <€, (9.38) 


weBt 
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where the first inequality follows from the definition of B”. From (9.37) and (9.38), 
it follows that 


lim inf (y:tam(B”) = 1—e. (9.39) 
noo 


Now let A C P({n]™) be arbitrary. Note that (9.38) holds also with B” replaced 
by 4/2 B®; so |Mnstam(A MN B”) — y,(A M B")| < €. Let (B”)° = P([n]™) — B? 
denote the complement of B?. Then we have 


In(A) = [n;tam(A)| = 

[n(AN BZ) + pn(A 2 (B2)°) = Mnstam(A 1 BZ) = Hn;tam(A 9 (B2)")| S 
IHn(AN BZ) = Hajtam(A 1 BZ)| + Hn(A 0 (B2)°) + Mastam(A N (B2)°) < 

€ + Un((B2)°) + [n:tam((BZ)’). (9.40) 


From (9.40) and the definition of the total variation distance, it follows that 


Dry (En (p), Eamikn (p)) = Dry (ibn, (cen) _ 
max |{n(A) — Lan;tam(A)| < € + Hn ((BZ)°) + Mntam((BZ)°). (9.41) 
ACP ([n]®) 
From (9.37), (9.39), (9.41), and the fact that € > 0 is arbitrary, we conclude that 
lim Dry(En(p), Ey" (p)) = 0. (9.42) 
noo 


oO 


Remark. The final two paragraphs of the proof can be replaced by a shorter argu- 
ment using L?-convergence and the Cauchy—Schwarz inequality. See Exercise 9.2. 


9.4 Ramsey Theory 


Consider the complete graph K,,. For each edge in K,,, choose either blue or red, 
and color the edge with that color. We call this a 2-coloring of K,. For2 <k <n, 
one can ask whether there exists a monochromatic clique of size k, that is, a clique 
with all of its edges blue or with all of its edges red. For k = 2, obviously there 
exists such a monochromatic clique, for all n => 2. The fundamental theorem of 
Ramsey theory states the following: 

For each integer k > 3, there exists an integer R(k) > k such that ifn > R(k), 
then every 2-coloring of Ky, will necessarily have a monochromatic clique of size 
k, while ifk <n < R(k), then it is possible to find a 2-coloring of Ky, with no 
monochromatic clique of size k. 
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Fig. 9.4 The above example shows that R(3) > 5 


Note that this result is purely deterministic—it says that no matter how we arrange 
the coloring of K,, there must be a monochromatic clique of size k, ifn > R(k). 
The exact computation of the Ramsey numbers R(k) is notoriously hard. One has 
R(3) = 6 and R(4) = 18, but the exact value of R(5) is unknown! See Fig. 9.4. 


Remark. It is known that 43 < R(5) < 49. The complete graph K43 has 5 - 43. 


42 = 903 edges. There are 2°” different two-colorings of Ka; and () = 962, 598 
different cliques of size 5. 


We will prove the above fundamental result by providing upper and lower bounds 
on R(k). A nice, elementary combinatorial argument yields the following result. 


Theorem 9.3. 
Rk) a4” k=. (9.43) 


Remark. The above estimate is not far from the best known asymptotic upper bound 
for R(k). In particular, it is not known if R(k) < c*, for large k and some c < 4. 
For the best known upper bound, see [12]. 


Proof. Let k > 3. Consider an arbitrary coloring of the complete graph K4x-1 of 
size 4‘—! = 27k? Define x; = 1 and Sp = Kyi-1. Since x, shares an edge with 
2?k—2 _ | vertices, there must be a set of vertices S; of size at least 22-3 such 
that every edge from x; to a vertex in S; is the same color. This is the so-called 
pigeonhole principle. Let x2 denote the vertex in S; with the lowest number. By the 
same reasoning, since x2 shares an edge with all the other vertices in S$), of which 
there are at least 22-3 — 1, there must be a set S2 C Sj of size at least 274—* such 
that every edge from x2 to a vertex in S2 has the same color. Continuing like this, 
we obtain a sequence x1, ..., X2—2 of vertices and a decreasing, nested sequence of 
sets of vertices {S; a such that x; € S;-1, j € [2k — 2]. By the construction, 
it follows that for each 7, the color of the edge joining x; to x; is the same for all 
j > i. Now look at the 2k — 3 edges (Gaal * Obviously, we can choose 
at least k — 1 of these edges to be all the same color. Find such a set of edges and 
denote the set of vertices in these edges by S. Note that |S| > k. Because the color 
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joining x; to x; is the same for all j > 7, it follows in fact that the color of the edge 
joining any two vertices in S' is the same. We have thus exhibited a monochromatic 
clique of size at least k. oO 


Despite the fact that the Ramsey number R(k) is a quantity associated with a 
purely deterministic result, one can give a very short and ingenious probabilistic 
proof of a lower bound for R(k). 


Theorem 9.4. R(k) > k, for all k > 3, and 


R(k) > “(1 + o(1))k2E, ask > 00. (9.44) 


Remark. The best known lower bound is just ./2 times the above estimate; see [2]. 
Thus, a real chasm lies between the best known upper bound and the best known 
lower bound! 


Proof. Consider a random two-coloring of the graph K,,, where each edge is colored 
red or blue with equal probability, and independently of what occurs at other edges. 
Let W be aclique in K, of size k, with 3 < k <n. Let Iw be the indicator random 
variable, which is equal to 1 if W is monochromatic, and equal to 0 otherwise. 
Since there are ‘@ edges in W, the probability that W is all blue (or all red) is 


(HQ); consequently, the probability that W is monochromatic is 2!-()_ Of course, 
the expected value FE Iw of Iw is also equal to 2!-(@), 


For 3 < k < n, let X, = Vwi=k Iw. The random variable X; counts the 
number of monochromatic cliques of size k in K,,. We have 


EX, = > Elw = ({)2-®. 


|W\|=k 


Since the average number of monochromatic cliques of size k in this random 
oe k ; é ; 
two-coloring is equal to (2)2'=G), there certainly must exist some particular two- 


coloring with exactly M monochromatic cliques of size k, forsome M < (")2!-G), 
Consider such a two-coloring. From each of the M monochromatic cliques of size k, 
remove one of the vertices. Let M’ denote the number of vertices removed. We have 
M’ < M. (it is possible that M’ < M because we might have removed the same 
vertex from more than one of the cliques.) What remains is a two-coloring of the 
complete graph on n — M’ vertices, and by construction, this two-coloring has no 
monochromatic cliques of size k. We conclude that 


R(k) >n—- (‘)e-, for any n > k. 
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In particular, choosing n = k + 1, one obtains R(k) > k +1—2(k + 1)2-G), and 
it is easy to check that the right hand side is greater than or equal to k, for all k > 3. 
In Exercise 9.3 the reader is asked to show that 


k<n<oo 


max | (n— - (f)2" @) = ~(1 + o(1))k28, ask—>oo. (9.45) 


Oo 


Remark. The strategy used to prove Theorem 9.4 is known as the probabilistic 
method. It was pioneered by P. Erdés. He used the method in a slightly different 
way from above and obtained a lower bound on R(k) with an extra factor of V2 in 
the denominator on the right hand side of (9.44). 


Exercise 9.1. Show that the total variation distance Dyy(j, v) defined in (9.30) 
satisfies (9.31). 


Exercise 9.2. This exercise presents an alternative approach in place of the final 
two paragraphs of the proof of part (ii) of Theorem 9.2. Recall that the Cauchy- 


Schwarz inequality states that | 7", ab; 24027 , 57), where 
{aj}7_,, {b; }"_, are real numbers and m is a ae integer. 


a. Use (9.36) and the Cauchy—Schwarz inequality to show that for any A C 
P((n]™), one has 


Nup(@) 2 
|Mntam(A) — Hn(A)| < Vn(A) | SO (Fae) — 1) He) = 
n,p 


weA 


Nup(o) _ 


2 
Hn(@). (9.46) 
ENS? , 


er 


wEP({n]2) 


b. The expression on the right hand side of (9.46) is called the L?-norm with respect 


(kn) 
Ng ee : 
to the measure [L, of the function are By — 1, which is defined on the domain 
np 


P({n]™). We denote this norm by || — l|J2:y,. Use (9.16) (where the 


E a 
notation Ny p(k) instead of n& is used), which holds for k, as in part (ii) 
of Theorem 9.2, to prove that 


(kn) 
lim || —“2~ — 1a, = 0. (9.47) 
noo ENS ld 


c. Conclude from (9.46) and (9.47) that (9.42) holds. 
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—(k) , 
Exercise 9.3. Show that (9.45) holds. (Hint: Let fix(x) = x — oe and 


91-4) x—k)k 
fax(x) = x- =e 2 Show that maxz<x<oo fik(X) ~ MAaXk<x<co fr.x(x) 


-1k k, 
as k —> oo. Since aa < (j) < %&, it then follows that maxg<y<oo (n — 


(")2!-G)) ~ MaXr<x<oco fi«(x), as k — ov. To obtain the asymptotic behavior 
of maxx<x<oo fi,«(X), you will need Stirling’s formula.) 


Exercise 9.4. Figure 9.4 shows that the Ramsey number R(3) satisfies R(3) > 5. 
Prove that R(3) = 6. 


Chapter Notes 


For a wide scope of results concerning graphs, deterministic and random, see 
Bollobas’ books [9] and [10]. 

For a paper that considers tampering detection, see [29]. In particular, one 
finds there two examples that show that the intuition for Theorem 9.2, discussed 
in the remark following the theorem, can fail. It should be noted that the word 
“detection” must be understood here in a very theoretical way, as there are no known 
algorithms for detecting this clique in a reasonable amount of time, namely an 
amount of time which grows no more than polynomially in the number of vertices n. 
The construction of such algorithms is known in the theoretical computer science 
literature as the “planted clique” problem. See, for example, the paper of Alon et al. 
[3], where for p = 5 it is shown that a planted clique of order n2 can be detected in 
polynomial time. (This order for the clique is of course far, far larger than the order 
logn for the cliques discussed in this chapter.) 

The proof of the existence of the Ramsey number R(x) goes back to F. Ramsey 
in 1930. The nice little book by Alon and Spencer [2] is devoted entirely to the 
probabilistic method in combinatorics. The book by Graham et al. [22] is devoted 
entirely to Ramsey theory. 


Chapter 10 

The Phase Transition Concerning the Giant 
Component in a Sparse Random Graph: 

A Theorem of Erd6s and Rényi 


10.1 Introduction and Statement of Results 


Let Gi(pn) = ([n], En(pn)) denote the Erdés—Rényi graph of size n which was 
introduced in Chap.9. As in Chap.9, the generic notation P for probability and 
E for expectation will be used in this chapter. Note that whereas in Chap. 9 the 
edge probability p was fixed independent of the graph size, in this chapter the 
edge probability p, will vary with n. A subset A C [n] of the vertex set [n] is 
called connected if for every x, y € A, there exists a path between x and y along 
edges in E,(p,). The vertex set [n] is of course equal to the disjoint union of its 
connected components. Let ce be the random variable denoting the size of the 
largest connected component in the random graph G,,(p,,). It turns out that the 
size of the largest connected component undergoes a striking phase transition as 
the edge probability passes from © with c < | to © with c > 1. In this chapter we 
will prove the following two theorems. 


Theorem 10.1. Let py = £5, with c < 1. Then there exists a y = y(c) such that the 
size C8 of the largest connected component of Gy, (pn) satisfies 


lim P(C!® < ylogn) = 1. 


noo 
Theorem 10.2. Let p, = £, with c > 1. Then there exists a unique solution B = 
B(c) € (0, 1) to the equation 1 — e~** — x = 0. For any € > 0, the size cz of the 
largest connected component of G,(pn) satisfies 


lim P((1—e)Bn < C8 < (1+ 2) fn) =1. (10.1) 
ai-*Oo 
R.G. Pinsky, Problems from the Discrete to the Continuous, Universitext, 109 


DOI 10.1007/978-3-319-07965-3__10, © Springer International Publishing Switzerland 2014 
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Furthermore, every other connected component of G,(pn) is of size O(logn) as 
n — ©; that is, letting CG HO denote the size of the second largest component, 
then for some y = y(c), 


lim P(C2"*® < ylogn) 


noo 


II 


(10.2) 


Remark 1. In light of (10.1) and (10.2), when p, = - with c > 1, the largest 
component is referred to as the giant component. 


Remark 2. It follows from the above theorems that when p, = a for some c > 0, 
the probability that the graph is connected approaches 0 as n — oo. This can be 
proved directly far more easily than the above theorems can be proved. Indeed, in 
Exercise 10.3, the reader is guided through a proof of the following fact concerning 
disconnected vertices; that is vertices that are not connected to any other vertices: 
If px = font tn then as m -—> oo, the probability of there being at least 
one disconnected vertex approaches 0 if limy—+ooc, = ©, while for any M, 
the probability of there being at least M disconnected vertices approaches | if 
limy;—+oo Cn = —oo. Actually, there is a totally trivial way to see that if p, = a 
then the probability that the graph is connected does not approach 1. Indeed, simply 
note that the probability that any particular vertex is disconnected is (1— eyr}; thus 
as n —> ox, the probability that any particular vertex is disconnected converges to 
e ©. The above theorems and this discussion naturally elicit the question, how large 
must p, be in order that the graph be connected? The answer to this was given also 
by Erdés and Rényi, who proved that the above threshold probability concerning 
whether or not the graph possesses disconnected vertices is also the threshold for 


connectivity: If p, = grt en | then as n — ov, the probability of the graph being 
connected approaches | if lim,— +60 Cy = oo and approaches 0 if lim, Cn = —O0. 
See [9]. 


Remark 3. If a connected component of a graph contains m vertices, then it must 
contain at least m — | edges. Thus, it follows from (10.1) that if c > 1, then for any 
€ > 0 and for large n, with high probability, the random graph G,, (+) will contain 
at least (1 — €)B(c)n edges. In Sect. 9.1 of Chap. 9, it was shown that for any € > 0, 
with high probability, the graph G,,(p) has 3n*p + O(n'**) edges. The same type 
of analysis shows that for any € > 0 and large n, with high probability the graph 
G,,(5) has Sen + O(n2t*) edges. Thus, one must have B(c) < 5, for 1 <c < 2. 
See Exercise 10.1. 


In Sect. 10.2 we construct the setup that will be used for the proofs of 
Theorems 10.1 and 10.2. In particular, we construct and analyze probabilistically 
an algorithm that calculates for each vertex of the graph the size of the connected 
component to which it belongs. In Sect. 10.3 we present a couple of basic large 
deviations estimates that will be needed for the proofs of the theorems. The results 
of Sects. 10.2 and 10.3 will allow for a quick proof of Theorem 10.1 in Sect. 10.4. In 
Sect. 10.5, we give a concise presentation of the Galton—Watson branching process 
and prove the most basic theorem of this subject, concerning the probability of 
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extinction. This will be used for one part of the proof of Theorem 10.2, which 
is presented in Sect. 10.6. The proof of Theorem 10.2 requires considerably more 
technical work over and above that which is required for the proof of Theorem 10.1. 


10.2 Construction of the Setup for the Proofs 
of Theorems 10.1 and 10.2 


Let x € [n] be a vertex of the random graph. All the random quantities that we define 
below depend on x and n, but we suppress this dependence in the notation. We 
construct an algorithm that produces the connected component to which x belongs. 
We begin by calling x “alive” and calling all of the other vertices in [n] “neutral.” 
We define Yo = 1, to indicate that at the beginning there is one vertex that is alive. 
Each of the neutral vertices y is now observed. If there is an edge connecting x to y, 
that is, if {x, y} € E,(pn), then y is declared alive; if not, then y remains neutral. 
After every such y has been checked, we declare x to be “dead.” We define Y; to 
be the new number of vertices that are alive. We also say that at time ¢ = | there 
is one dead vertex. This ends the first step of the algorithm. We continue like this. 
If at the end of step ¢ there are Y; > O vertices that are alive (and ¢ dead vertices), 
we begin step ¢ + 1 by selecting one of the alive vertices (it doesn’t matter which 
one) and call it z. Each of the currently neutral vertices y is now observed. If there is 
an edge connecting z and y, then y is declared alive; if not, then y remains neutral. 
After every such y has been checked, we declare z to be “dead.” We define Y;+, to 
be the new number of vertices that are alive, and we say that at time ¢ + | there are 
t + 1 dead vertices. The process stops at the end of the step T for which Yr = 0. 
It follows that at the end of step T, there are T dead vertices. A little thought shows 
that these dead vertices form the connected component to which x belongs. Thus 
T is the size of the connected component to which x belongs. (The reader should 
verify this.) See Fig. 10.1. Of course, T is a random variable since it depends on the 
random edge configuration E,,(p,). 

For 1 < t < T, define Z; to be the number of neutral vertices that are declared 
alive at step t. Then from the description of the algorithm, we have for 1 < ¢ < T, 


Yy=¥.4¢ 2-1. (10.3) 


Assuming that t < 7, at the end of step ¢ — 1, there are ¢ — 1 dead vertices and 
Y;-1 > 0 alive vertices. Thus there are n — t — Y;-; + 1 neutral vertices. A key 
feature of the above algorithm is that no pair of vertices is ever checked twice. 
Consequently, for every pair of vertices that is checked, the probability of there 
being an edge between them is equal to p,, independently of what occurred when 
checking other pairs of vertices. Thus, since Z; counts how many of the n — t — 
Y;—1 + 1 neutral vertices have a common edge with the alive vertex z that has been 
selected for implementing step ¢, and since the probability of there being an edge 
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y = 2: neutral > alive 

y =3: neutral > neutral 

y =4: neutral > alive 

y =5: neutral > neutral 

y =6: neutral > neutra. 

ax =1 is declared dead 5 
Take alive vertex z = 2 2 

y =3: neutral > neutra 1 

y = 5: neutral > alive 

y =6: neutral > neutral 4 
x = 2 is declared dead 

Take alive vertex z = 4 : 

y =3: neutral > neutra 

y =6: neutral > neutra 

z = 4 is declared dead 6 


Take alive vertex z = 5 
y =3: neutral > neutral 


y =6: neutral > neutral 


z = 5 is declared dead 
There are no more alive vertices, so algorithm ends 
Dead sites : {1, 2, 4, 5} = the connected component containing « = 1 


Fig. 10.1 The algorithm 


from z to any given neutral vertex is p,,, it follows that Z, is distributed according to 
the binomial distribution with parameters n —t — Y;_; + l and p,:forl<t<T, 


Ze Binh =f =H 1: (10.4) 


Of course, Y;-;, which appears in the size parameter of the binomial distribution 
above, is itself a random variable. The meaning of (10.4) is that conditioned on 
knowing that Y;,_; = y, then Z, ~ Bin(n—t—y-+1, p,). Since no pair of vertices is 
ever checked twice, and since from (10.3), Y;-; only depends on {Z, sy , It follows 
that given the value of Y;—;, and given that T > t, the random variable Z,; and the 
random variables {Z, a, are conditionally independent; that is, for all m > 1 and 


allt > 2, 


P(Z,; €-,{Zs¥ ao) € |Yj1 =m,T >t) = 


s=l 


P(Z; €-|Yj1 =m,T > t)PQZ}¥a) € Yi =m,T > 2). (10.5) 

As noted, (10.3) and (10.4) hold only up to time 7; however it will be convenient 
to define Y, and Z, recursively from (10.3) and (10.4) for all integers 0 < ¢ <n. 
(Thus, e.g., if T = fo, then we have Y;,, = 0 (as well as Z,, = 0), and thus 
Zi+1 ~ Bin(n — to, Pn) and Y,,41 = Ziy+1 — 1.) In particular, fort > T, Y, 
can take on negative values. For 1 < t < T, note that the number JN, of neutral 
vertices at the end of step ¢ is given by N, = n —t — Y,. We use this equation to 
define No, namely, No = n—1, indicating that there are n — | neutral vertices before 
the first step begins. We now use this equation to extend WN, also to all0 < ¢ <n. 
We have the following key lemma. 
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Lemma 10.1. 
Y,-1+¢t~ Bin(n—1,1—-(1— p,)'), t > 0. 


Proof. Since N; = n—t—Y, = (n—1)—(¥Y;—1 +14), the statement of the lemma 
is equivalent to 


N, ~ Bin(n — 1, (1 — p,)’). (10.6) 


We prove (10.6) by induction. Clearly (10.6) holds for t = 0. Now assume that for 
some t > l, 


N,-1 ~ Bin(n — 1, (1 — pn)‘!). (10.7) 
Using (10.3), we have 
N,=n-t—-Y,=n—-t—-Y,;-,+1-Z, = M-1—-Z. (10.8) 


However, from (10.4) and the definition of N;-;, we have Z; ~ Bin(N;—-1, Pn). 
Thus, N;-; — Z; ~ Bin(N;—1, 1 — pz), and it follows from (10.8) that 


N, ~ Bin(N,-1, 1 — Pp). (10.9) 


By the inductive hypothesis (10.7), N;—1 is the number of heads in n— 1 independent 
coin flips, where on each flip the probability of heads is (1 — p,)‘~!. Then (10.9) 
states that N; is the number of “successes” in n — | independent trials, where each 
trial consists of first tossing a coin with probability (1 — p,)‘—~! of heads and then 
tossing a second coin with probability 1 — p, of heads, and a “success” is defined as 
obtaining heads on both flips. This description of N; is the description of a random 
variable distributed according to Bin(n — 1, (1 — p,)‘). For an alternative derivation 
that (10.9) and (10.7) imply (10.6), using generating functions, see Exercise 10.4. 

Oo 


10.3 Some Basic Large Deviations Estimates 


We present two propositions which are known as large deviations estimates. The 
first proposition will be used in the proof of Theorem 10.1 and the second one will 
be used in the proof of Theorem 10.2. 


Proposition 10.1. Let c € (0,1). Forn € Z* andt > 0 with ee <1, let S,4 ~ 


Bin(n, ©). Then there exists ak = x(c) > 0, independent of n and t, such that 
n 


P(Sn4 >t) <e™. 
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Remark. Note that ES, ; =n(<) = te <t, since c € (0,1). 


Proof. For any A > 0, we have 


P(Sn1 > t) < exp(—At)E exp(’S;,;), (10.10) 


since exp(A(S,; —t)) => 1 on the event {S,,, > r}. 

Since S,,, is the number of successes in n independent Bernoulli trials, each 
of which has probability fe of success, it follows that S,,; can be represented as 
Sat = =i B;, where the {B; Fei are independent and identically distributed 
Bernoulli random variables with parameter =e that is, P(B; = 1) = 1— P(B; = 
0)= ae Using the fact that these random variables are independent and identically 


distributed, we have 


Eexp(aS, 1) = E exp(A > Bj)= I] E exp(AB;) = (E exp(AB,))”" = 


j=l j=l 


‘(eos re ey", (10.11) 
n n 


Since 1 + y < e?, for all y, we have (1 + i < e*, for all x > Oandalln > 1. 
Thus, (1 — © + “e4)" < e(’—l), and consequently, (10.10) and (10.11) give for 
any A > 0 


P(Sn1 >t) < eAtetele—I) — exp (—(A-— cet + c)t). (10.12) 


The function f(A) := A —ce* + satisfies f(0) = 0, and f’(0) > 0, since 
c € (0,1). Thus, there exist k = k(c) > 0 and Ap > O such that f(A9) = x. We 
then conclude from (10.12) that P(S,; >t) < e7*'. Oo 


Proposition 10.2. For eachn € Z*, let S, ~ Bin(n, p), where p € (0, 1). Let 


1 — po 
l=p 


K(po: p) = polos = + (1 ~ po) log »0<p,po <1. 


Then k (po, p) > 9, if p # po, and 
(i) if 0 < po < 1, then 

P(S, > pon) < eX)", for all n; 
(ii) if0 < po < p, then 


P(S, < Pont) < eK (P0.p) for alln. 
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Remark. The function x(o, p) is a relative entropy. For more about this, see the 
notes at the end of the chapter. 


Proof. The following three facts show that (ii) follows from (i): Ss, = n— S, is 
distributed according to the distribution Bin(n, 1 — p), P(S, < pon) = P(S, > 
(1 — po)n) and k(1 — po, | — p) = kK (po, p). So it suffices to show that (i) holds and 
that k(p0, 0) > 0, if p # po. 

Let po > p. For any A > 0, we have 


P(S, = pon) < exp(—Apon)E exp(aS,,), (10.13) 


since exp(A(S,, — por)) = 1 on the event {S,, > pon}. We can represent the random 
variable S,, as S, = Le B,;, where the {B; Hel are independent and identically 
distributed Bernoulli random variables with parameter p; that is, P(B; = 1) = 
1 — P(B; = 0) = p. Using the fact that these random variables are independent 
and identically distributed, we have 


Eexp(AS,,) = E exp(a > Bj)= I] E exp(AB;) = (E exp(AB)))”" = 


j=l j=l 


(pe 4 1 = py. (10.14) 
Thus, from (10.13), we obtain the inequality 
P(Sn = pon) < [e7(pe* +1—p)]", foralln > 1landallA>0. (10.15) 


The function f(A) := e7(pex + 1 — p), A > 0, satisfies f(0) = 1, 
limysoo f(A) = oo, and f’(0) = —pp + p < 0. Consequently, f possesses 
a global minimum at some Ap > 0, and f(Ao) € (0,1) [indeed, f(Ao) < 0 
would contradict (10.15)]. In Exercise 10.5, the reader is asked to show that 
fo) = (eeyr(hy”. Note now that «(o, ¢), defined in the statement of 
the proposition, is equal to —log f(Ao). Thus, x(p9, 0) > 0, for po > p, and 
e (PP) — F(Ag). Substituting A = Ao in (10.15) gives 


P(S;, = pon) < ek (P0.p)n 


Finally, since k (po, p) = K(1 — po, | — p), it follows that «(p9, p) > 0, if op ~ po. O 


10.4 Proof of Theorem 10.1 


In this section, and also in Sect. 10.6, we will use tacitly the following facts, which 
are left to the reader in Exercise 10.6: 
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1. If X; ~ Bin(n;, p), i = 1,2, and n, > no, then P(X, > k) > P(X) = k), for 
all integers k > 0. 

2. If X; ~ Bin(n, p;),i = 1,2, and py > po, then P(X, > k) > P(X2 = k), for 
all integers k > 0. 


We assume that p, = © with c € (0, 1). From the analysis in Sect. 10.2, we have 
seen that for x € [n], the size of the connected component of G,,(p,) containing x 
is given by T = min{t > 0: Y; = 0}. (As noted in Sect. 10.2, the quantities T and 
Y, depend on x and n, but this dependence is suppressed in the notation.) Let Y, be 
a random variable distributed according to the distribution Bin(n — 1, 1 — (1 — £)‘). 
Then from Lemma 10.1, 


P(T >t) < P(Y, > 0) = P(Y, >t—1) = P(Y, = 1). 


(The inequality above is not an equality because we have continued the definition 
of Y; past the time T.) Let Y, be a random variable distributed according to the 
distribution Bin(n — 1, ah By Taylor’s remainder formula, (1 — x)! > 1 — tx, 
for x => 0 and ¢ a positive integer. Thus, « = 1—(1 — £)‘, and consequently 
P(Y, >t)< P(Y,; > t). Thus, we have 


P(T >t) < P(Y, >?). (10.16) 


this with (10.16) and Proposition 10.1, we conclude that there exists ak > 0 such 
that 


If Sn, ~ Bin(n, «) as in Proposition 10.1, then P(Y,; >t) < P(S,. = t). Using 


P(T >t)<e“,t>0,n2>1. (10.17) 
Let y > O satisfy yx > 1. Then from (10.17) we have 
P(T > ylogn) < e7*” 8" = n-*”, (10.18) 


We have proven that the probability that the connected component containing x is 
larger than y logn is no greater than n~“”. There are n vertices in G,,(p,,); thus the 
probability that at least one of them is in a connected component larger than y logn 
is certainly no larger than nn~*” = n'~*Y -> 0 as n — ov. This completes the 
proof of Theorem 10.1. oO 


10.5 The Galton—Watson Branching Process 


We define a random population process in discrete time. Let {q,}P2, be a nonneg- 
ative sequence satisfying )°°°) qn = 1. We will refer to {q,}°2o as the offspring 
distribution of the process. Consider an initial particle alive at time t = 0 and set 
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Fig. 10.2 A realization of a branching process that becomes extinct atn = 5 


Xo = | to indicate that the size of the initial population is 1. At time ¢ = 1, this 
particle gives birth to a random number of offspring and then dies. For eachn € ZT, 
the probability that there were n offspring is g,. Let X; denote the population size 
at time 1, namely the number of offspring of the initial particle. In general, at any 
time ¢ > 1, all of the X;_; particles alive at time ¢ — 1 give birth to random numbers 
of offspring and die. The new number of particles is X;. The numbers of offspring 
of the different particles throughout all the generations are assumed independent 
of one another and are all distributed according to the same offspring distribution 
{dn tne n=0° 

The random population process {X;}°2, is called a Galton—Watson branching 
process. Clearly, if X, = 0 for some ¢, then X, = 0 for all r => ¢. If this occurs, 
we say the process becomes extinct; otherwise we say that the process survives. 
See Fig. 10.2. If go = 0, then the probability of survival is 1. Otherwise, there 
is a positive probability of extinction, since at any time ft — 1, there is a positive 
probability (namely ae ‘“!) that all of the particles die without leaving any offspring, 
in which case X; = 0. The most fundamental question we can ask about this process 
is whether it has a positive probability of surviving. 

Let W be a random variable distributed according to the offspring distribution: 
P(W =n) = qn. Let 


fore 
w= EW= Sond 


n=0 


denote the mean number of offspring of a particle. It is easy to show that EX;4) = 
LEX, (Exercise 10.7), from which it follows that EX, = yp’, t > 0. From this, it 
follows that if w < 1, then lim;... EX; = 0. Since EX, > P(X, => 1), it follows 
that lim;9. P(X; => 1) = 0, which means that the process has probability 1 of 
extinction. The fact that EX; is growing exponentially in t when uw > 1 would 
suggest, but not prove, that for 4. > 1 the probability of extinction is less than 1. In 
fact, we can use the method of generating functions to prove the following result. 
Define 


¢(s) = Yas 9 € (0, 1]. (10.19) 
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The function ¢(s) is the probability generating function for the distribution {g,, }°2 0. 


Theorem 10.3. Consider a Galton—Watson branching process with offspring dis- 

tribution {qn}, where qo > 0. Let “= Y~-yNdn € [0,00] denote the mean 

number of offspring of a particle. 

(i) If w < 1, then the Galton—Watson process becomes extinct with probability 1. 

(ii) If 4 > 1, then the Galton—Watson process becomes extinct with probability 
a € (0,1), where a is the unique root s € (0, 1) of the equation $(s) = s. 


Proof. If do + q: = 1, then necessarily, 4 < 1. Thus, it follows from the paragraph 
before the statement of the theorem that extinction occurs with probability 1. 
Assume now that go + qi < 1. Since the power series for @(s) converges uniformly 
for s € [0,1 — ], for any € > 0, it follows that we can differentiate term by term 
to get 


oe) 


CO 
¢'(s) = > oa >0, $"(s) = Sinn —1gns"? => 0, O<s <1. 


n=0 n=0 


In particular then, since go + qi < 1, @ is a strictly convex function on [0, 1], and 
consequently, so is y(s) := o(s) — s. We have w(0) = go > O and w(1) = 0. 
Also, lims;- w/(s) = lims_.;- $’(s) — 1 = yu — 1. Since yw is strictly convex, it 
follows that if uw < 1, then w’(s) < 0 fors € [0,1), and consequently w(s) > 0, 
for s € [0,1). However, if ~ > 1, then w’(s) > 0 for s < 1 and sufficiently 
close to 1. Using this along with the strict convexity and the fact that y(0) > 0 
and w(1) = 0, it follows that there exists a unique a € (0,1) such that y(a@) = 0 
and that w(s) > 0, for s € (0,@), and w(s) < 0, for s € (a, 1). (The reader should 
verify this.) We have thus shown that 


the smallest root a € [0, 1] of the equation $(z) = z satisfies 
a € (0,1), if > 1, anda = 1, if ~ < 1. Furthermore, in the case ps > 1, 


one has ¢(s) > s, for s € [0,a), and d(s) <s, fors € (@, 1). (10.20) 


Now let x; := P(X; = 0) denote the probability that extinction has occurred by 
time t. Of course, Ko = 0. We claim that 


K, = (k;-1), fort > 1. (10.21) 


To prove this, first note that when ¢ = 1, (10.21) says that k; = 6(0) = qo, which 
is of course true. Now consider ¢t > 1. We first calculate P(X; = 0|X, = n), the 
probability that X; = 0, conditioned on X; = n. By the conditioning, at time ¢ = 1, 
there are n particles, and each of these particles will contribute independently to the 
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population size X; at time ¢, through t — 1 generations of branching. In order to 
have X; = 0, each of these n “new” branching processes must become extinct by 
time ¢ — 1. The probability that any one of them becomes extinct by time ft — 1 is, 
by definition, «;-;. By the independence, it follows that the probability that they all 
become extinct by time ¢ — 1 is «/_,. We have thus proven that 


P(X, = 0/X, =n) = ki. 


Since P(X; = 1) = qn, we conclude that 


CO 


Kk, = P(X, =0) = > P(X, =n) P(X, = 0X, = 2) = Yo gat, = ole), 


n=0 n=0 


proving (10.21). From its definition, «; is nondecreasing, and Kext := lim;—o0 K; 18 
the extinction probability. Letting t — oo in (10.21) gives 


Kext = P (Kext). (10.22) 


It follows immediately from (10.22) and (10.20) that ket = lif < 1.Ifu > 1, 
then there are two roots s € [0,1] of the equation ¢(s) = s, namely s = @ and 
s = 1 If kee = 1, then x > a for sufficiently large tf, and then by (10.20) 
and (10.21), for such t, we have k;4; = $(k;) < k;, which contradicts the fact 
that x; is nondecreasing. Thus, we conclude that kx, = @. Oo 


At one point in the proof of Theorem 10.2, we will use the above result on the 
extinction probability of a Galton—Watson branching process. However, we will 
need to consider this process in an alternative form. In the original formulation, 
at time ¢, the entire population of size X;_; that was alive at time ¢ — | reproduces 
and dies, and then X; is the new population size. In other words, time ¢ referred 
to the fth generation of particles. In our alternative formulation, at each time, f¢ 
only one of the particles that was alive at time ¢ — 1 reproduces and dies. Thus, 
as before, we have Xp = | to denote that we start with a single particle, and X, 
denotes the number of offspring that the original particle produces before it dies. At 
time ¢ = 2, instead of having all X, particles reproduce and die simultaneously, we 
choose (arbitrarily) just one of these particles that was alive at time ¢ = 1 and have 
it reproduce and die. Then X> is equal to the new total population. We continue in 
this way, at each step choosing just one of the particles that was alive at the previous 
step. Since in any case, the number of offspring of any particle is independent of the 
number of offspring of the other particles, it is clear that this new process has the 
same extinction probability as the original one. 
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10.6 Proof of Theorem 10.2 


We assume that p, = a with c > 1. From the analysis in Sect. 10.2, we have seen 
that for x € [n], the size of the connected component of G,,(p,) containing x is 
given by T = min{t > 0: Y; = 0}. 

Consider a Galton—Watson branching process {X;}?2, in the alternative form 
described at the end of Sect. 10.5, and let the offspring distribution be the Poisson 
distribution with parameter c; that is, gn = eee The probability generating 
function of this distribution is given by , 


lee) lo) o 
_ m _ —c © m _ c(s—1) 
p(s) = > dns = me a =e (10.23) 


The expected number of offspring is equal to c. Since c > 1, it follows from 
Theorem 10.3 that the extinction time 7. = inf{t > 1: X; = 0} satisfies 


P(Text < 00) = @, (10.24) 


where a € (0, 1) is the unique solution s € (0, 1) to the equation #(s) = s, that is, 
to the equation e°°—)) = s, Substituting z = 1 — s in this equation, this becomes 


CZ 


1 — @ is the unique root z € (0, 1) of the equation | -—e *“—z=0. (10.25) 

Let {W,}?2, be a sequence of independent, identically distributed random 
variables distributed according to the Poisson distribution with parameter c. If 
X;-1 4 0, then W, will serve as the number of offspring of the particle chosen 
for reproduction and death from among the X;~; particles alive at time t — 1. Then 
we may represent the process {X;}°2, by Xo = 1 and 


BS 2) WH PS Te (10.26) 


If Text < oo, then of course X; = 0 for all t > 7.x. For any fixed t > 1, as soon as 
one knows the values of {W,}'_,, one knows the values of {X,}'_,. (We note that 


it might happen that these values of {W,}‘_, result in X,, = 0 for some so < f¢, 
in which case the values of 1Weyae 4, are superfluous for determining the values 


of {X,}'_,.) If 7 := {r,}{_, are the values obtained for {W,}‘__,, let [:= ae 
denote the corresponding values for {X;}!_,. We write l=/1 (7). Note that Tex, > t 
occurs if and only if /, > 0, forO < s < t —1, or, equivalently, if and only if 
Ly-1 > 0. 

Now consider the process {Y;}?2, introduced in Sect. 10.2. Recall that T is 
equal to the smallest ¢ for which Y; = 0. Note from (10.3) and (10.26) that 
{Y¥,}4 is defined recursively in a way very similar to the way {X ee is defined. 
The difference is that the independent sequence of random variables {W,}?2, 
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distributed according to the Poisson distribution with parameter c is replaced by 
the sequence {Z,}°2,. The distribution of these latter random variables is given 
by (10.4) and (10.5). (As noted in Sect. 10.2, Y;, T, and Z; depend on x and n, 
but that dependence has been suppressed in the notation.) Because the form of the 
recursion formula is the same for {x33 and eee and because Xp = Yo = 1, 
it follows that ifr = {r,}‘_, are the values obtained for {Z,}!_,, and if 1(7) satisfies 
1,1 > 0, then /(F) are the corresponding values for {Y;}/_,. 
Since the random variables {W,}‘_, are independent, we have 


PCW a1 = 7) =] [ POW = 75). (10.27) 
s=1 


By (10.4) and by the conditional independence condition (10.5), if /;-; > 0, we 
have 


PCZ) 4 =7) =| | P27 hah, (10.28) 


s=1 


where, for convenience, we define J) = 1. By (10.4), the distribution of Z,, 
conditioned on Y;-; = /;—1, is given by Bin(n — s — /s_; + 1, ©). Since lim,— oo 
(n—s—Is_; + 1) =, it follows from the Poisson approximation to the binomial 
distribution (see Proposition A.3 in Appendix A) that 


lim P(Z; = rs|Ys-1 = Is—1) = P(W; = Is). 


noo 


Thus, we conclude from (10.27) and (10.28) that for any fixed r, 
lim PUZs}a1 = 7) = PQWe}a1 = 7), 
for all 7 = {r,}'_, for which /;_\(F) > 0, 
and, consequently, that for any fixed fr, 


lim PY}, =) = PQX} 2, =D), for all / = {1,}‘_, satisfying l,_, > 0. 
(10.29) 


Since T.x:, the extinction time for {X;}?°2o, is the smallest ¢ for which X, = 0, and 
Text = t is equivalent to /;_; > 0, and since T is the smallest ¢ for which Y, = 0, it 
follows from (10.29) that 


lim P(T <t) = P(Tex < t), for any fixed t > 1. (10.30) 
noo 
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From (10.24), we have lim;-+o0 P(Text < t) = a; thus, for any € > 0, there exists an 
integer t. such that P(Text < t) € (a@— $,q@), if t > t,. It then follows from (10.30) 
that there exists an ;,-< = m1,<(t) such that 


P(T <the(a—e¢,a+e), iff >t andn >n;-(t). (10.31) 


We now analyze the probabilities P(en < T < (l—a—e)n) and P((l—a+e)n < 
T <n). We will show that these probabilities are very small. From (10.25), it 
follows that 1 — e~% > z, for z € (0,1 —q@), and 1 —e~“ < z, forz € (1—a, 1]. 
Consequently, for « > 0, choosing 6 = 6(€) sufficiently small, we have 


l1—e “—26 > z, forz € (€,l1—a—e); l1—e % 426 <z, forze (1—-a+e, 1]. 
(10.32) 


Since T is the smallest t for which Y; = 0, we have 


{en < T < el == €)n} Cc Ven<i<(—a—en Y1 < O}. 


(Recall that Y, has also been defined recursively for t > T and can take on negative 
values for such t.) Thus, letting Y, be the random variable distributed according to 
the distribution Bin(n — 1,1 — (1 — i ), it follows from Lemma 10.1 that 


P(en <T <(1—a—e)n) < y PU, 2t=1). (10.33) 


en<t<(1—a—e)n 


jpn cb 


One has lim, 0(1 — > , uniformly over b in a bounded set. (The 
reader should verify this by taking the logarithm of (1 — cybn and applying Taylor’s 
formula.) Applying this with b = +, with 0 < t < n, it follows that (1 — £)! — enn 
is small for large n, uniformly over t € [0,7]. Thus, for 6 = 6(€), which has been 
defined above, there exists an 12,5 = 2,5) such that 1 — (1 — >1- or = 6, 
forn > nos and0O < t <n. Let x be a random variable distributed according to 
the distribution Bin(n — 1, 1 — e~ — 68). Then P(Y, 2f=1) 2.70) 2¢=1)- 
n > nN. Using this with (10.33), we obtain 


=e° 


P(en<T <(1—a—e)n) < a P(Y, <t—1),n>13. (10.34) 


en<t<(l—a-—e)n 


Every ¢ in the summation on the right hand side of (10.34) is of the form t = b,n, 
with e < b, < 1—a-—e. Thus, it follows from (10.32) that 1 — ear —$ = 
1 — ech» — § > b, + 6. We now apply part (ii) of Proposition 10.2 with n — 1 in 
place of n, with p = 1—e~°» —8, and with py = b,. Note that p and pp are bounded 
from 0 and from | as n varies and as ft varies over the above range. Also, we have 
p > po + 6. Consequently, there exists a constant k > 0 such that « (po, p) = «, for 
all p, po as above. Thus, we have forn > n2.5, 
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P(Y; <t—1) = P(% < bn —1) < P(Y; <b,(n—1)) < e *""., (10.35) 
From (10.34) and (10.35) we conclude that 
P(en<T <(l-a-—e)n) < (1 —a)ne*"-), nN > N28). (10.36) 
A very similar analysis shows that 
P(l-—a+en<T <n) <ane*’), n> N3,8(6)s (10.37) 


for some 13,5 = 13,4(c). This is left to the reader as Exercise 10.8. 
We now analyze the probability P(t < T < en), for fixed t. As in (10.33), we 
have 


Pi<T<en)< DY) P(X <s-1), (10.38) 


t<s<en 


where, we recall, Y, is distributed according to the distribution Bin(m — 1, 1 — (1 — 
<)*). Let Y; be a random variable distributed according to the distribution Bin(n — 1, 
(1 — £)*). Then 


P(Y, <s—1) = P(Y, >n-s). (10.39) 
As in the proofs of Propositions 10.1 and 10.2, we have for any A > 0 


P(¥, >n—s) <e%- Berk, (10.40) 
We can represent the random variable Y, as Y, — am B;, where the {B; yo are 
independent and identically distributed Bernoulli random variables with parameter 
(1 — £)°; that is, P(B; = 1) = 1— P(B; = 0) = (1 — £)°. Using the fact that 
these random variables are independent and identically distributed, we have 
n-1 


y : Cys C. sqn-1 
Ee: = || Ee” =[a- ~)*)e* ee ae i (10.41) 


j=l 


Thus, from (10.40) and (10.41), we obtain 


P(Y, =n—-s) <e%**-9/ - =)*e -1=d= =p Co) 
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We now substitute n = Ms in (10.42) to obtain 
P(Y, >n —s) es e DT — ys a= 1) +4 iN aie < 
Ms 
eV — £ ys(eA say: _ eo |-MM-D+M log (ase) ]_ 
Ms 
for all A > 0. (10.43) 


We will show that for an appropriate choice of A > 0, the expression in the square 
brackets above is negative and bounded away from 0 for all s > 1 and sufficiently 
large M. Let 


fam (A) = —A(M = 1) + Mog ((1- me =iy+il). 
Then f;.4(0) = 0 and 


M(1— ;5.)*e* 
d=) +1 


fiuQ) =—-(M -1)+ (10.44) 


For any fixed A, defining g(y) = for y > 0, it is easy to check that 


ye 
y(e*—1) +1” 
g’(y) > 0; therefore, g is increasing. The last term on the right hand side of (10.44) 
is Mg(y), with y = (1 — 37,)°. Since 1 - x < e™, for x = 0, we have 


d - a) < em, ifn = Ms >} c, and thus the last term on the right hand 


side of (10.44) is bounded from above by M g(e~™), independent of s, for s > vie 
Thus, from (10.44), we have 


Mewes Me 
1 (A) <—(M — 1) + ————_ = -M 1+. ———___ = 
fs.) ( e-u(eA—1)4+1 e-+—l+em 
l—em c 
1+ M—W_.,, foralls > —. 
e-—l+ewm M 
Since limy-+o0 Me = —ce, uniformly over A € [0, 1], and since c > 1, 


it follows that there exists aA) > 0 and an Mp such that if A € [0,A9] and M > Mo, 
then fim (A) < an for all s > 1. It then follows that fi. y(Ao) < — Af) for all 
M > Mo ands > 1. Choosing A = Ao in (10.43) and using this last inequality for 


Js.m (Ao), we conclude that 


Ag(e=1) 


P(Y,>n—s)<e 72 °%, forn > Mos, s > 1. (10.45) 
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From (10.38), (10.39), and (10.45), we obtain the estimate 
a4), 


Aole—D ? 
— 2960 


—Aoe=) , ha —Aole=) , 
P(t<T <en)< y e 2 ne ye z= 
1 


t<s<en s=t 


1 
ife < VM,” (10.46) 
0 


Now (10.31), (10.36), (10.37), and (10.46) guarantee that for any € € (0, 1), we can 
choose f, and n, such that for all m > n,, one has 


a-e<P(T <t)<at+e; 


P(T >t.,T ¢(A1—a@—e)n,(1—a@+€)n)) <€; 
(10.47) 
l-a-—2e < P(T €((l—a—e)n,(l—a+e)n)) <1-ate, 


for alln > ne. 


(The third set of inequalities above is a consequence of the first two sets of 
inequalities.) 

We recall that the above estimates have been obtained when p = — with 
c > 1, and where 1 — a = 1 — a(c) is the unique root z € (0, 1) of the equation 
1 —e-“ — z= 0. The reader can check that the above estimates hold uniformly for 
c € [c1,¢2], for any 1 < cy < Cc. Thus, consider as before a fixed c > 1 and 
a = a(c), and let 5 > 0 satisfy c— 6 > 1. For c’ € [c — 6, c], let a’ := a(c’). Then 
for all € > O, there exists at, > 0 andan, > O such that for all nm > n, and all 
c’ € [c — 6, c], one has for the graph G(n, ©), 


a’—-e< P(T <t.) <a’ +6; 


P(T >t.,T ¢((l1—a@’ —e€)n, (1—a’ + €)n)) <€; 
(10.48) 
l—a' —2e < P(T € (1—a@’ —e)n, (1—a@’ + ©)n)) < 1-a' +€, 


for alln > ng. 


Return now to our graph G(n,*), with n considerably larger than the n, 
in (10.48). (We will quantify “considerably larger” a bit later on.) Recall that we 
started out by choosing arbitrarily some vertex x in the graph G(n, ©), and then 
applied our algorithm, obtaining 7, which is the size of the connected component 
containing x. Call this the first step in a “game.” If it results in T < f,., say that a 
“draw” occurred on the first step. If it results in (1—a@—e)n < T < (1—a+e)n, say 
that a “win” occurred on the first step. Otherwise, say that a “loss” occurred on the 
first step. If a win or a loss occurs on this first step, we stop the procedure and say 
that the game ended in a win or loss, respectively. If a draw occurs, then consider the 
remaining n — T vertices that are not in the connected component containing x, and 
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consider the corresponding edges. This gives a graph of size n’ = n — T. Note that 
by the definition of the algorithm, there is no pair of points in this new graph that has 
already been checked by the algorithm. Therefore, the conditional edge probabilities 
for this new graph, conditioned on having implemented the algorithm, are as before, 
namely oe independently for each edge. This edge probability can be written as 
Pn = e, where c’ = ale, Now T < ¢,.. Thus, ifm > n, is sufficiently large, 
then c’ € [c —6,c] andn’ =n —T > n,, so the estimates (10.48) (with n replaced 
by n’) will hold for this new graph, which has n’ vertices and edge probabilities 
Pr = f. Choose an arbitrary vertex x; from this new graph and repeat the above 
algorithm on the new graph. Let 7; denote the random variable T for this second 
step. If a win or a loss occurs on the second step of the game, then we stop the game 
and say that the game ended in a win or a loss, respectively. (Of course, here we 
define win, loss, and draw in terms of 7,,n’, and a’ instead of T,n, and a. However, 
the same ¢, is used.) If a draw occurs on this second step, then we consider the 
n'—T, = n—T — T, vertices that are neither in the connected component of x 
nor of x;. We continue like this for a maximum of M, steps, where M, is chosen 
sufficiently large to satisfy (a (c—6)+ 2 i < €. (We work with € > 0 sufficiently 
small so that a(c — 6) + € < 1.) The reason for this choice of M, will become 
clear below. If after M. steps, a win or a loss has not occurred, then we declare 
that the game has ended in a draw. Note that the smallest possible graph size that 
can ever be used in this game is n — t,(M, — 1). The smallest modified value of c 
that can ever be used is mt De, We can now quantify what we meant when 
we Said at the outset of this paragraph that we are choosing n “considerably larger” 
than n,. We choose n sufficiently large so that n — t.(M. — 1) => n, and so that 
ate Do > c — 6. Thus, the estimates in (10.48) are valid for all of the steps of 
the game. 

It is easy to check that a = a(c) is decreasing for c > 1. Thus, if the game ends 
in a win, then there is a connected component of size between (1 — a(c — 5) — €)n 
and (1 — a(c) + €)n. What is the probability that the game ends in a win? Let W 
denote the event that the game ends in a win, let D denote the event that it ends ina 
draw, and let L denote the event that it ends in a loss. We have 


P(W) =1— P(L)— P(D). (10.49) 


The game ends in a draw if there was a draw on M, consecutive steps. Since on any 
given step the probability of a draw is no greater than a(c — 6) + €, the probability 
of obtaining M. consecutive draws is no greater than 
(a(c —6)+ ey: so by the choice of M., we have 

P(D) < (a(c —8) +6) <e. (10.50) 
Let D* denote the complement of D; that is, D© = W U L. Obviously, we have 
L = LM D*. Then we have 


P(L) = P(LN D‘) = P(D‘)P(L|D°) < P(L|D*). (10.51) 
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If one played a game with three possible outcomes on each step—win, loss, or 
draw—with respective nonzero probabilities p’, g’, and r’, and the outcomes of all 
the steps were independent of one another, and one continued to play step after step 
ra 
and the probability of a loss would be nwa (Exercise 10.9). Conditioned on D°, 
our game essentially reduces to this game. However, the probabilities of win and 
loss and draw are not exactly fixed, but can vary a little according to (10.48). Thus, 
we can conclude that 


until either a win or a loss occurred, then the probability of a win would be 


€ € 


P(L|D* = ; 10.52 
ca 3 Tele) = Be He l1—a(c—545)-€ wn?) 
From (10.49)-(10.52) we obtain 
€ 
8 ag ry er er (10.53) 


In conclusion, we have demonstrated the following. Consider any c > | and any 
56 > 0 such that c — 6 > 1. Then for each sufficiently small « > 0 and sufficiently 
large n depending on €, with probability at least 1 — « — aeae there will exist 
a connected component of G(n, ©) of size between (1 — a(c — 6) — €)n and (1 — 
a(c) + €)n. If the connected component above, which has been shown to exist 
with probability close to 1 and which is of size around (1 — q@)n, is in fact with 
probability close to 1 the largest connected component, then the above estimates 
prove (10.1), since by (10.25) the 6 defined in the statement of the theorem is in 
fact 1 — a. Thus, to complete the proof of (10.1) and (10.2), it suffices to prove 
that with probability approaching | as n — oo, every other component of G(n, ©) 
is of size O(logn), as n — ov. In fact, we will prove here the weaker result that 
with probability approaching 1 as n — oo, every other component is of size o(n) 
as n — ov. In Exercise 10.10, the reader is guided through a proof that every other 
component is of size O(logi). 

To prove that every other component is of size o() with probability approaching 
1 as n — co, assume to the contrary. Then for an unbounded sequence of n’s, 
the following holds. As above, with probability at least 1 — « — == there 
will exist a connected component of G(n, ©) of size between (1 — a(c — 6) —€)n 
and (1 — a(c) + €)n, and by our assumption, for some y > 0, with probability 
at least y, there will be another connected component of size at least yn. We may 
take y < 1—a(c — 6) — €. But if this were true, then at the first step of our 
algorithm, when we randomly selected a vertex x, the probability that it would be 
in a connected component of size at least yn would be at least 


€ (1-—a(c —5)—€)n yn 
("Siegen n eae 
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For € and 6 sufficiently small, this number will be larger than 1 — a(c) + = in 


which case the algorithm would have to give P(T < t.) < a(c) — : However, for 
€ > 0 sufficiently small, this contradicts the first line of (10.47). Oo 


Exercise 10.1. This exercise refers to Remark 3 after Theorem 10.2. Prove that for 
any € > 0 and large n, the number of edges of G,,(£) is equal to $cn + O(n 2t¢) 
with high probability. Show directly that B(c) < 5, for 1 <c < 2, whale B(c) is as 
in Theorem 10.2. 


Exercise 10.2. Let D,, denote the number of disconnected vertices in the Erdés— 
Rényi graph G,, (p,,). For this exercise, it will be convenient to represent D,, as asum 
of indicator random variables. Let D,,; be equal to | if the vertex 7 is disconnected 
and equal to 0 otherwise. Then D, = )7/_, Dni- 


(a) Calculate ED,. 
(b) Calculate ED?. (Hint: Write ED? = E(3~"_, Dai \Qvj=1 Daj). 


Exercise 10.3. In this exercise, you are guided through a proof of the result noted 
in Remark 2 after Theorem 10.2, namely that: 

(p,= then as nox, the probability that the Erdés—Rényi graph Gy, (pn) 
possesses at least one disconnected vertex approaches 0 if liMy—+o9 Cy = 00, while 
for any M, the probability that it possesses at least M disconnected vertices 
approaches I if limy—+o Cy = —OO. 


Let D,, be as in Exercise 10.2, with p, = ’&2t. 


n 


(a) Use Exercise 10.2(a) to show that lim,.., ED, equals 0 if lim, C, = oo 
and equals oo if lim,95 C, = —oo. (Hint: Consider log E Dy and note that by 


Taylor’s remainder theorem, log(1 — x) = —x — for0 < x < 1, 


TFT 
where x* = x*(x) satisfies 0 < x* < x.) 

(b) Use (a) to show that if lim, C, = 00, then lim,+.. P(D, = 0) = 1. 

(c) Use Exercise 10.2(b) to calculate ED?. 

(d) Show that if lim,—+oo C, = —0o, then the variance o*(D,) satisfies 0?(D,) = 
o((ED,)’). (Hint: Recall that o?(D,) = ED? —(ED,)’.) 

(e) Use Chebyshev’s inequality with (a) and (d) to conclude that if lim,+o0¢, = 
—oo, then for any M, lim;-+o0 P(D; = M) = 1. 


Exercise 10.4. Recall from Chap. 5 that the probability generating function Px (s) 
of a nonnegative random variable X taking integral values is defined by 


Py(s) = Es* = Si si P(X =i). 


The probability eencraune function of a random variable X uniquely characterizes 


its distribution, because FeO x) = P(X =i). 
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(a) Let X ~ Bin(n, p). Show that Py(s) = (ps + 1-— p)". 
(b) Let Z ~ Bin(n, p), andlet Y ~ Bin(Z, p’), by which is meant that conditioned 


on Z = m, the random variable Y is distributed according to Bin(m, p’). 
Calculate Py (s) by writing 


Py(s) = Es’ = )> E(s”|Z =m)P(Z =m), 


m=0 


and conclude that Y ~ Bin(n, pp’). Conclude from this that (10.7) and (10.9) 
imply (10.6). 


Exercise 10.5. Let f(A) = e~7°(pe* + 1 — p), with 0 < p < py < 1. Show that 
infys0 f(A) is attained at some Ap > 0 and that f(A) = (ane (2) € (0,1). 


1—po 


Exercise 10.6. If X ~ Bin(n, p), then X can be represented as X = )~/_, B;, 
where {B;}/_, are independent and identically distributed random variables dis- 
tributed according to the Bernoulli distribution with parameter p; that is, P(B; = 
1) =1-—P(B; =0)=p. 


(a) Use the above representation to prove that 


(b 


wm 


if X; ~ Bin(n;, p), i = 1,2, andn, > nz, then 


(10.54) 
P(X, > k) => P(X) =k), for all integers k > 0, 
and that 
if X; ~ Bin(n, p;), i = 1,2, and p, > po, then 
(10.55) 
P(X, > k) > P(X, =k), for all integers k > 0. 
(Hint: For (10.54), represent X, using the random variables {B;};!, and 


represent X» using the first 22 of these very same random variables. For (10.55), 
let {U;}"_, be independent and identically distributed random variables, dis- 
tributed according to the uniform distribution on [0, 1]; that is, P(a < U; < 
b) = b—a, for0 < a < b < 1. Define random variables ae, and 
{Bo }7_, by the formulas 


Now represent X, and X, through {Boy and { Bp? \"_,, respectively. This 
method is called coupling.) 
Prove (10.54) and (10.55) directly from the fact that if X¥ ~ Bin(n, p), then for 


0<k <n,onehas P(X >k) = 4 (")p7 — pyri, 
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Exercise 10.7. If {X;,}?2) is a Galton—Watson branching process of the type 
described at the beginning of Sect. 10.5, show that EX,,,; = wEX,, where pu is 
the mean number of offspring of a particle. (Hint: Use induction and conditional 
expectation.) 


Exercise 10.8. Prove (10.37) by the method used to prove (10.36). 


Exercise 10.9. Prove that if one plays a game with three possible outcomes on each 

step—win, loss, or draw—with respective nonzero probabilities p’, g’, and r’, and 

the outcomes of all the steps are independent of one another, and one continues to 

play step after step until either a win or a loss occurs, then the probability of a win 
p q’ 

pi+q’ 


1S p +q"" 


and the probability of a loss is 


Exercise 10.10. In the proof of Theorem 10.2, after the algorithm for finding the 
connected component of a vertex was implemented a maximum of M, times, and a 
component with size around (1 —a@)n was found with probability close to 1, the final 
paragraph of the proof of the theorem gave a proof that with probability approaching 
1 asm — oo, all other components are of size o(n) as n — oo. To prove the stronger 
result, as in the statement of Theorem 10.2, that with probability approaching 1 as 
n — oo all other components are of size O(log), consider starting the algorithm 
all over again after the component of size around (1 — ~)n has been discovered. 
The number of edges left is around n’ = an and the edge probability is still <, 
which we can write as c with C x ca. If C < 1, then the method of proof of 
Theorem 10.1 shows that with probability approaching | as n — oo all components 
are of size O(logn’) = O(logn) as n — oo. To show that C < 1, it suffices to 
show that ca < 1. To prove this, use the following facts: (1) xe~* increases in [0, 1) 
and decreases in (1,00), so for c > 1, there exists a unique d € (0,1) such that 
de~4 = ce~*; (2)a =e @-), 


Chapter Notes 


The context in which Theorems 10.1 and 10.2 were originally proven by Erdés 
and Rényi in 1960 [18] is a little different from the context presented here. Let 
N:= (5). Define G(n, M), 0 < M < N, to be the random graph with n vertices 
and exactly M edges, where the M edges are selected uniformly at random from 
the N possible edges. One can consider an evolving random graph {G(n, ane By 
definition, G(n, 0) is the graph on n vertices with no edges. Then sequentially, given 
G(n,t), forO0 < t < N—1, one obtains the graph G(n, tf +1) by choosing at random 
from the complete graph K,, one of the edges that is not in G(n,t) and adjoining 
it to G(n, t). Erdés and Rényi looked at evolving graphs of the form G(n, t,), with 
t, = [S$]. They showed that if c < 1, then with probability approaching 1 as 
n — oo, the largest component of G(n,t,) is of size O(logn), while if c > 1, 
then with probability approaching 1 as n — oo there is one component of size 
approximately B(c) - 7, and all other components are of size O(logn). To see how 
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this connects up to the version given in this chapter, note that the expected number 
of edges in the graph G,,(>) is £() = co) A detailed study of the borderline 
case, when ¢, ~ 5 asm — oo, was undertaken by Bollobas [8]. Our proofs of 
Theorems 10.1 and 10.2 are along the lines of the method sketched briefly in the 
book of Alon and Spencer [2]. We are not aware in the literature of a complete 
proof of Theorems 10.1 and 10.2 with all the details. 

The large deviations bound in Proposition 10.2 is actually tight. That is, in part 
(i), where po > p, for any € > O, one has os sufficiently large n, P(S, > 
pon) = e «(P.)+O" Thus, in particular, lim r00 5 ; log P(Sy; = pon) = —K(po, p). 
Similarly, in part (ii), where py < p, lim, 4 “log P(S, < por) = —k(po, p). 
Consider two measures, j4 and jlo, defined on a finite or countably infinite set A. 
Then H (i103 #) *= Yoye4 Mo(x) log © aot) is called the relative entropy of [49 with 
respect to jw. It plays a fundamental (ea in the theory of large deviations. In the 
case that A is a two-point set, say A = {0,1}, and w({1}) = 1 — w({0}) = p and 
juo({1}) = 1 — je({0}) = po, one has H(j19; 4) = K(po, p). For more on large 
deviations, see the book by Dembo and Zeitouni [13]. 

For some basic results on the Galton—Watson branching process, using prob- 
abilistic methods, see the advanced probability textbook of Durrett [16]. Two 
standard texts on branching processes are the books of Harris [24] and of Athreya 
and Ney [7]. 


Appendix A 
A Quick Primer on Discrete Probability 


In this appendix, we develop some basic ideas in discrete probability theory. We 
note from the outset that some of the definitions given here are no longer correct in 
the setting of continuous probability theory. 

Let Q be a finite or countably infinite set, and let 2° denote the set of subsets 
of Q. Anelement A € 2° is simply a subset of Q, but in the language of probability 
it is called an event. A probability measure on Q is a function P : 2° — [0,1] 
satisfying P(@) = 0, P(&) = 1 and which is o-additive; that is, for any 1 < 
N < ow, one has P(U*_, An) = ae P(A,), whenever the events {A, Po 
are disjoint. From this o-additivity, it follows that P is uniquely determined by 
{P ({x})}xea. Using the o-additivity on disjoint events, it is not hard to prove that 
P is o-sub-additive on arbitrary events; that is, P(U*_, An) < yy P(A,), for 
arbitrary events {A,}*_,. See Exercise A.1. The pair (Q, P) is called a probability 
space. 

If C and D are events and P(C) > 0, then the conditional probability of D 
given C is denoted by P(D|C) and is defined by 


P(CND) 

PDIC) = See 

Note that P(-|C) is itself a probability measure on Q. Two events C and D are 
called independent if P(C N D) = P(C)P(D). Clearly then, C and D are 
independent if either P(C) = 0 or P(D) = 0. If P(C), P(D) > 0, it is easy 
to check that independence is equivalent to either of the following two equalities: 
P(D|C) = P(D) or P(C|D) = P(C). Consider a collection {C,,}*_, of events, 
with 1 < N < ow. This collection of events is said to be independent if for any 
finite subset {C,,, }""_, of the events, one has P(N 1 Cn) = eer P(C,,). 

Let (Q, P) be a probability space. A function X : Q — R is called a (discrete, 
real-valued) random variable. For B C R, we write {X € B} to denote the event 
X'(B) = {@ € Q: X(w) € B}, the inverse image of B. When considering the 
probability of the event {X € B} or the event {X = x}, we write P(X € B) or 
P(X = x), instead of P({X € B}) or P({X = x}). The distribution of the random 
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variable X is the probability measure jzy on R defined by wx(B) = P(X é€ B), 
for B C R. The function py (x) := P(X = x) is called the probability function or 
the discrete density function for X. 

The expected value or expectation EX of a random variable X is defined by 


EX = ae: P(X =x)= Y > x px (x), if S > |x| P(X =Xx)<o., 


xER xER xER 


Note that the set of x € R for which P(X = x) > 0 is either finite or countably 
infinite; thus, these summations are well defined. We frequently denote EX by w. If 
P(X => 0) = 1 and the condition above in the definition of EX does not hold, then 
we write EX = oo. In the sequel, when we say that the expectation of X “exists,” 
we mean that 7 ep |x| P(X = x) < oo. 

Given a function y : R — R and a random variable X, we can define a new 
random variable Y = w(X). One can calculate EY according to the definition of 
expectation above or in the following equivalent way: 


EY =) 0 y(x)P(X =x), if )>|W@)|P(X = x) < 0. 


xER xE€R 


For n € N, the nth moment of X is defined by 


EX’ =) PPX Sx), it Yi PS 2) Se: 
xE€R xER 


If 4. = EX exists, then one defines the variance of X, denoted by o” or o7(X) or 
Var(X), by 


o? = E(X — yp)’ = ) (xp) P(X =x). 


xeER 
Of course, it is possible to have 07 = oo. It is easy to check that 
o°(X) = EX? — p’. (A.1) 


Chebyshev’s inequality is a fundamental inequality involving the expected value 
and the variance. 


Proposition A.1 (Chebyshev’s Inequality). Let X be a random variable with 
expectation | and finite variance o*. Then for all 4 > 0, 


2, 
Oo 
P(X —pl>A< 5. 
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Proof. 


oy 
PUX-Wl=)= Do Px=ns YD SY ewan < 


xER:|x—p|>A xER:|x—p|>A 
(x - wy? o* 
Dg = ap: 


xER 


oO 


Let {X;}"_, be a finite collection of random variables on a probability space 
(Q, P). We call X = (Xj,..., X,) a random vector. The joint probability function 
of these random variables, or equivalently, the probability function of the random 
vector, is given by 


Px(X) = px(™1,...,%n) = P(X, = %,..., Xn =X) = P(X =x), 


x, €R, i =1,...,n, where x = (x1,..., Xn). 


II 


It follows that >? etn} DixjeR Px(x) P(X; = x;). For any function H : 


R" — R, we define 


EH(X) = )~ H(x)px(x), if )> |A(x)|px (x) < 00. 


xeER" xER" 


In particular then, if EX; exists, it can be written as EX; = Yo epn Xj px (x). 
Similarly, if EX, exists, for all k, then we have 


n 


EX = ROS CkXk) Px (XxX) = Y ce( +a Xk Px (Xx)). 
k=1 k=1 


xER" k=1 xe€R" 


It follows from this that the expectation is linear; that is, if EX, exists fork = 
1,...,”, then 


BS EX, = So ce EXx, 
k=1 k=1 


for any real numbers {cx }/_|- 

Let {X; en be a collection of random variables on a probability space (Q, P), 
where | < N < ov. The random variables are called independent if for every finite 
n < N, one has 
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POG = %1y Xa ays kn Son) = [| PRPS ep): 


j=l 


forallx; ¢ R, j =1,2,...,n. 


Let { f;}7_, be real-valued functions with /; defined at least on the set {x € R: 
P(X; = x) > O}. Assume that E| f;(X;)| < oo, fori = 1,...,n. From the 
definition of independence it is easy to show that if {X; Vint are independent, then 


E| | A) =] ] EAC. (A.2) 
i=l 


i=1 


The variance is of course not linear. However the variance of a sum of independent 
random variables is equal to the sum of the variances of the random variables: 


If {X;}"_, are independent random variables, then 


oS Xi) = 37 0°(X)). (A.3) 


i=1 i=1 


It suffices to prove (A.3) for n = 2 and then use induction. Let uw; = EX;,i = 1,2. 
We have 


o?(X1 + Xo) = E(X1 + Xo — E(X + X2))” = E((X1 — m1) + (Xa - pa)” = 
E(X, — pi)? + E(X2 — po)* + 2E(X — pt) (Xo — 2) = 07 (X1) + (kX), 


where the last equality follows because (A.2) shows that E(X, — 41)(X2 — 2) = 
E(X — 1) E(X2 — pa) = 0. 

Chebyshev’s inequality and (A.3) allow for an exceedingly short proof of 
an important result—the weak law of large numbers for sums of independent, 
identically distributed (IID) random variables. 


Theorem A.1. Let {X,,}°2, be a sequence of independent, identically distributed 
random variables and assume that their common variance o? is finite. Denote their 


common expectation by ju. Let S;, = St X ;. Then for any € > 0, 


Sh 
lim P(|— — p| > 6) =0. 
n—>oo n 
Proof. We have ES, = ny, and since the random variables are independent and 


identically distributed, it follows from (A.3) that o7(S,) = no*. Now applying 
Chebyshev’s inequality to S,, with A = ne gives 
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no? 


P(|Se= ni S ne) => 
(ne)? 


which proves the theorem. O 


Remark. The weak law of large numbers is a first moment result. It holds even 
without the finite variance assumption, but the proof is much more involved. 

The above weak law of large numbers is actually a particular case of the 
following weak law of large numbers. 


Proposition A.2. Let {Y,}°2., be random variables. Assume that 
o°(Y,) = o((EY,)’), asn > ov. 


Then for any € > 0, 


Yn 
lim P(| 


-—l|>6)=0. 
noo EY, 
Proof. By Chebyshev’s inequality, we have 


oy) 


(cEY,)” 


P(\Y — EY,| eal €|EY,|) a 


oO 


If X and Y are random variables on a probability space (Q,P), and if 
P(X=x)>0, then the conditional probability function of Y given X = x is 
defined by 


P(X =x,Y =y) 


Py|x(Qy|x) = PY = y|X =x) = P(X =x) 


The conditional expectation of Y given X = x is defined by 


E(Y|X =x) =) y PY =y/X =x) =)_y py (I), 
yeR yeR 


if IW PO SylX Sx) <8: 
yeR 


It is easy to verify that 


EY =) EO |X =aP@=3), 


xER 


where E(Y |X = x) P(X =x) :=0,if P(X =x) =0. 
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A random variable X that takes on only two values—O and 1, with P(X = 
1) = p and P(X = 0) = 1— p=, for some p € [0, 1]—is called a Bernoulli 
random variable. One writes X ~ Ber(p). It is trivial to check that EX = p and 
o*(X) = p(1— p). 

Let n € N and let p € [0, 1]. A random variable X satisfying 


P(X =j)= (j)a-pr 7 Oly cen, 


is called a binomial random variable, and one writes X ~ Bin(n, p). The random 
variable X can be thought of as the number of “successes” in n independent trials, 
where on each trial there are two possible outcomes—“success” and “failure”— 
and the probability of “success” is p on each trial. Letting {Z;}?7_, be independent, 
identically distributed random variables distributed according to Ber(p), it follows 
that X can be realized as X = }~’_, Z;. From the formula for the expected 
value and variance of a Bernoulli random variable, and from the linearity of the 
expectation and (A.3), the above representation immediately yields EX = np and 
o?(X) =np(1— p). 
A random variable X satisfying 


n 


P(X =n) = eth 


—,n=0,1,..., 
n! 


where A > 0, is called a Poisson random variable, and one writes X ~ Pois(A). 


One can check easily that EX = A and o?(X) = 4. 


Proposition A.3 (Poisson Approximation to the Binomial Distribution). For 
n € Nand p € (0, 1], let Xn,» ~ Bin(n, p). For A > 0, let X, ~ Pois(A). Then 


lim P(Xn» = j) = P(X =f), f =0,1,.... (A.4) 


noo, p>0,.np>d 


Proof. By assumption, we have p = fa where limy—+o0 An = A. We have 
A n j n-j n=l) 7 1) An j Xn n=j 
| ean t= 7 ( y (l-—) f= 
J ]! n n 
L ge Deas + Dg An yn}. 
J! nl n 
thus, 


we 
lim Pius =f) Se" SP GS): 


n—>oo,p—>0,np>A J ! 
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Equation (A.4) is an example of weak convergence of random variables or dis- 
tributions. In general, if {X,}°2 , are random variables with distributions {1 x, }°2,, 
and X is a random variable with distribution zx, then we say that X, converges 
weakly to X, or jy, converges weakly to px, if limysoo P(X, < x) = 
P(X < x), for all x € R for which P(X = x) = O, or equivalently, if 
limy;—oo Mx, ((—00, x]) = fx ((—00, x]), for all x € R for which wy ({x}) = 0. 
Thus, for example, if P(X, = 1) = P(X, = 1+ 1) — S, forn = 1,2,---, 
and P(X = 0) = P(X = 1) = $, then X, converges weakly to X since 
limy+oo P(X, < x) = P(X < x), for all x € R — {0, 1}. See also Exercise A.4. 


Exercise A.1. Use the o-additivity property of probability measures on disjoint sets 
to prove o-sub-additivity on arbitrary sets: that is, P(U_, An) < ys P(A,,), for 
arbitrary events tA where | < N < oo. (Hint: Rewrite UP An as a disjoint 


n=1? 


union Bs by letting B; = A,, By = Az — Aj, B3 = A3 — Az — A), etc.) 


Exercise A.2. Prove that P(A,;UA2) = P(A,)+ P(A2)—P(A,N Az), for arbitrary 
events A;, Ay. Then prove more generally that for any finite n and arbitrary events 
{Ag} ia one has 


P(Upa14e) = D> P(A)-— 0 PCAN A;)+ 


l<i<n l<i<j<n 


=, P(A; NA; M Ag) — +++ + (-1)" 1 P(A, 9 Ag Ag). 


1l<i<j<k<n 


This result is known as the principle of inclusion—exclusion. 


Exercise A.3. Let ((2, P) be a probability space and let R > 2 be an integer. For 
A C Q, recall that the complement A‘ of A is defined by A° = Q — A. Prove that 
if the events {A;} an are independent, then the complementary events {Ache are 
also independent. (Hint: By the definition of independence, we have 


l 
P(n4_,B;) = I] P(B;), for any  < R and any 
jai (A.5) 


sub-collection {B; ie of {Ach 


: : c £ c 
Using this, we need to prove that PP), B%) = Tj=1 P(B‘), for any sub- 
collection {B¢}4_, of {Ag}&_,. Let pj = P(B;) and p = P(N‘_,B‘). Then 


we need to prove that p = [Dard — p;). Write 
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1<i<¢ 1<i<j<e 


and use (A.5) along with the principle of inclusion—exclusion, which appears in 
Exercise A.2. 


Exercise A.4. Using (A.4), show that 


lim P(Xy pp SX) = P(X, <= x), forallx ec R. 


n—>oo,p—>0,np>A 


Appendix B 
Power Series and Generating Functions 


We review without proof some basic results concerning power series. For more 
details, the reader should consult an advanced calculus or undergraduate analysis 
text. We also illustrate the utility of generating functions by analyzing the one that 
arises from the Fibonacci sequence. 

Let {a,}°2, be a sequence of real numbers. Define formally the generating 
function F(t) of {an }°29 by 


FQ) =>. ant”, (B.1) 
n=0 


where t € R. We say “formally” because we have made the definition before 
determining for which values of ¢ the power series on the right hand side above 
converges. The power series converges trivially for t = 0, and it is possible that it 
converges only for ¢ = 0, for example, if a, = n!. 

The power series )°°°.) a,t” converges absolutely if \~-> 
power series is uniformly, absolutely convergent for |t| < p if 


|dnt"| < oo. The 


that is, if the tail of the series }°°° 9 |ant”| converges to 0 uniformly over |¢| < p. 
We state four fundamental results concerning the convergence of power series: 


1. If the power series converges for some number to # 0, then necessarily the power 
series converges absolutely and uniformly for |t| < p, for all p < to. 

2. There exists an extended real number ro € [0,00] such that the power series 
yey Ant” converges absolutely if t € [0, ro) and diverges if t > ro. 
The number 7p in (2) is called the radius of convergence of the power series. 
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3. The radius of convergence is given by the formula 


1 
P= 
lim sup, 569 2/@n 


4. If the power series is uniformly, absolutely convergent for |t| < p, then the 
function F(t) in (B.1) is infinitely differentiable for |t| < p, and its derivatives 
are obtained via term by term differentiation in the power series; in particular, 
rus) nee”: 


The generating function often provides an efficient method for obtaining infor- 
mation about the sequence {a,}2,. Typically, this will occur when the generating 
function can be written in a nice closed form and analyzed. This analysis then allows 
one to obtain information about the coefficients in the generating function’s power 
series expansion, and these coefficients are of course {d@,}°°.). We illustrate this in 
the case of the famous Fibonacci sequence. 

Recall that the sequence of Fibonacci numbers is defined recursively by fo = 0, 
fi = land 


Sn = fn-1 + fn—2, forn > 2. (B.2) 


The first few Fibonacci numbers are 0,1,1,2,3,5,8,13, 21,34,55,89,144. 
We will obtain a closed form for the generating function 


FO=> At (B.3) 


n=0 


of the Fibonacci numbers. Multiply both sides of (B.2) by ¢” and then sum both 
sides over 1, with n running from 2 to oo. This gives us 


fore) fore) fore) 
Te _ Ee aft Yat 


n=2 n=2 n=2 


Since fo = Oand f; = 1, the left hand side above is equal to F(t) — t. Factoring 
out ¢ from the first term and ft? from the second term on the right hand side above, 
and using the fact that fy = 0, one sees that the right hand side above is equal to 
tF(t) + t7 F(t). Thus, we obtain the equation 


F(t)-—t =tF(t)+t°F(0), 


which gives a closed form expression for F'; namely, F(t) = oe Up until now 
we have ignored the question of convergence. However, the above formula gives us 


the answer. The roots of the polynomial ¢* + f — 1 are rr? := ousS and r7 t= 


—1-/5 


2. 


. Since |r*| < |r~|, we conclude that the generating function F(t) has radius 
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of convergence |r*| = wel . Thus, the generating function of the Fibonacci series 
is given by 
V5-1 
F)= , |tl< B4 
O= ae l< SS (B4) 
We now use the method of partial fractions to represent the function ;—~— 3 in an 
explicit power series. Using the fact that r~r~ = —1, we write 
e+t—-l=¢-—r')\t—-r-) =-Cr +)Der* 4+); 
thus, 
t _ t (B5) 
1—-t—22 9 (t7— +1)(tr+ +1) ; 
For unknown A and B, we write 
t A B _ t(Ar+ + Br-)+ (A+B) 


(+ Dert4+1) +1) ttl Gr + hert +) 


Comparing the left-most and right-most terms in (B.6) , we conclude that A+ B = 0 


and Ar* + Br~ = 1. Solving for A and B, we obtain A = t= = re and 


B= + = “oe Thus, from (B.5) and the first equality in (B.6), we arrive at 
the partial fraction representation 


t ae. 1 1 ) (B7) 
l-t—02) J5°1+tr> 1+0trt” : 
Since ia | > [rt], both i= ae and 7; aly can be written as geometric series if 
2 1 
It] < = (“ea ee . We have 
I - LENG 4, 
as —]| n — ie 
er 2 Ye) a 
(B.8) 
1 [o.@) 
= —1)" + npn — npn 
l+¢rt 2 yo") ey, 
Thus, from (B.4), (B.7), and (B.8), we obtain 
CO 
1+ 75 V5 
F(t) = "yt", B.9 
O=Ya(4r- (B.9) 


n=0 
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Comparing (B.3) with (B.9), we conclude that the nth Fibonacci number f, is 
given explicitly by 


1, ltv5,, 1-V5,, 
a (¢ 5 Yoes \"). 


From the explicit formula in (B.10), the asymptotic behavior of f, is clear: 


1 ls 
gs” 2 


tn = (B.10) 


Ih~ 


)" asn > oo. 


Appendix C 
A Proof of Stirling’s Formula 


Stirling’s formula states that 
ni~nn"e "J27n, asn > o. (C.1) 


In order to obtain an asymptotic formula for the discrete quantity 1!, it is extremely 
useful to be able to embed this quantity in a function of a continuous variable. 
Integrating by parts and then applying induction shows that n! = ['(n + 1),n EN, 
where the gamma function 1'(t) is defined by 


CO 
T(t) = i gle de. tS 0 
0 
Thus, one proves Stirling’s formula in the following form. 
Theorem C.1 (Stirling’s Formula). 
r¢+l~te'V2nt, ast > oo. (C.2) 
Proof. In the literature one can find literally dozens of proofs of Stirling’s formula. 


We present here an elementary proof that uses Laplace’s asymptotic method [14]. 
We begin by giving the intuition for the method. We write 


[o,@) 
rie+1) =f eV dy, (C.3) 
0 
where 
W(x) =tlogx —x. 
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Now y; takes on its maximum at x = ft, and the Taylor expansion of yy; about x = tf 
starts out as 


1)" 


pt Me). 


tlogt -—t— 


Replacing y; by Wi. we calculate that 


Se)  sigpiag a pf oe 
eX dx = ef 8 a dx =t'e e 2 dx. 
0 0 0 


Making the substitution z = A gives 


Since paw erie dz = 2m, we conclude that 


foe) A 
/ eV dx ~ tle'/2nt, ast > 00. 
0 


We now turn to the rigorous proof. We can write W, exactly as 


Vill + y) = tlogt—1—18(2), 
where 
g(v) = v—log(1 + v). 


Substituting this in (C.3) and making the change of variables x = y + t, we obtain 


co : 
T(it+1)= et f e 8D) dy, 


t 


Making the change of variables y = ./tz, we have 
Pit +1) =t'e“/2xt Tn), (C.4) 


where 


T(t) = — | oD dz, 
a 
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We will show that 
lim T(t) = 1. (C.5) 
too 


Now (C.2) follows from (C.4) and (C.5). 
Fix L > 0 and write 


ri) =Tfi@+ = T+ = Te), (C.6) 
where 
7 1 e —tg(=) 
w=] e vi dz 
and 


lee) 2 -L ares 
oS / e 8A dz, T; (t) = / e 8D az, 
L vi 


—V/t 


From Taylor’s remainder formula it follows that for any e > 0 and sufficiently small 
v, one has 


5(l ev? < gv) < 5 $v 


Thus, lim;—.99 ¢ a = 52, uniformly over z € [—L, L]; consequently, 


Jim PL) = © dz, (C.7) 


1 95; 
wale 
2m JL 


Since t(g(S))’ = J/t(1- ap) = fe is increasing in z, we have 


Jt + L -ietF) = 


JtL 


Tr @)< 


Ls” 1 —tg(—= 
it [aoe a 


Vt+L o thRaloxtl+ wi 


JtL 


By Taylor’s formula, we have log(1 + a) = i - ce + O(t~?) as t — oo; thus, 


— 


lim sup T;* (t) < aera (C.8) 
100 L 
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A very similar argument gives 
: = Ta L2 
lim sup 7; (t) < —e 2”. (C.9) 
too L 


Now from (C.6)-(C.9), we obtain 


1 [ 12 zs Z 
— e 2° dz < liminfI(t) < liimsupI'(t 
JA —L 100 ( ) oe ( ) 

2 172 


< l is 3 J + 
< — e 2 dz e 2 
V20 JL LV 20 


Since I(t) is independent of L, letting L — oo above gives (C.5). O 


Appendix D 
2 
An Elementary Proof of )-_, 4 => 


The standard way to prove the identity in the title of this appendix is via Fourier 
series. We give a completely elementary proof, following [1]. Consider the double 


integral 
1 1 1 
I = | i dxdy. (D.1) 
o Jo l—xy 


(Actually, the expression on the right hand side of (D.1) is an improper integral, 
because the integrand blows up at (x,y) = (1,1). Thus, is As Le dy = 


li l-e pl-e | 
Me9+ Jo Jo Tex 


problem applying the standard rules of calculus directly to is 7 a dxdy.) On 
the one hand, expanding the integrand in a geometric series and integrating term by 
term gives 


1 pl © 
I = / diy)" dxdy = > [ x"y" dxdy = 
o “0 n=0 


n=0 


S(f va)(f re) - ae Le (D.2) 


n=0 


(The interchanging of the order of the integration and the summation is justified by 
the fact that all the summands are nonnegative.) 

On the other hand, consider the change of variables u = ,v= 
transformation rotates the square [0, 1] x [0, 1] clockwise by 45° and shrinks its sides 
by the factor /2. The new domain is {Uu,v):0<u< s, —u<v<usU{(uv): 
5 <u<l,u—1<v <1—u}. The Jacobian 32 


2, so the area element dxdy gets replaced by 2dudv. The function a becomes 


Since the function and the domain are symmetric with respect to the u-axis, 


yx 


of the transformation is equal to 


1 
1—u2+0v2° 
we have 
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n= is 


1 l-u dv 
(| (apse) 


dx 1 x : 
Pag? = q arctan 7, we obtain 


1 


2, fi dv 


il 
2 


Using the integration formula 


V1— uv 


Nie 


Ir=4 arctan —— )dut+ arctan ( ) du. 


eer Vi-w Nera 


: : — : 1 : : 
Now the derivative of g(u) := arctan ( ) is ==> and the derivative of 


1-u 


h(u) := arctan ( Fa) = arctan (,/{;*) is —;—+. Thus, we conclude that 


u~ 


1 


2 I 1 
I= sf g(u)g' (u) du— sf h(u)h'(u) du = 2g?(u)|2 — 4h? (w)| = 
0 ; 2 


1 1 1 
2( arctan? —— — arctan? 0) — 4( arctan? 0 — arctan? —-) = 6 arctan? — 
( V3 pa V3 ) V3 
2 
a4 a4 
= 6(—Y = —. D3 
( 6) F (D.3) 


Comparing (D.2) and (D.3) gives 
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Abel summation, 77 
arcsine distribution, 37 
average order, 13 


B 

Bernoulli random variable, 138 

binomial random variable, 138 

branching process — see Galton—Watson 
branching process, 117 


C 

Chebyshev’s y-function, 70 
Chebyshev’s 6-function, 68 
Chebyshev’s inequality, 134 
Chebyshev’s theorem, 68 
Chinese remainder theorem, 19 
clique, 89 

coloring of a graph, 104 
composition of an integer, 5 
cycle index, 58 

cycle type, 51 


D 
derangement, 49 
Dyck path, 40 


E 

Erdés—Rényi graph, 89 
Euler ¢-function, 11 

Euler product formula, 19 
Ewens sampling formula, 52 
expected value, 134 
extinction, 117 


F 
Fibonacci sequence, 142 
finite graph, 89 


G 

Galton—Watson branching process, 117 
generating function, 141 

giant component, 110 


H 
Hardy—Ramanujan theorem, 81 


I 
independent events, 133 
independent random variables, 135 


L 
large deviations, 113 


M 

Mertens’ theorems, 75 
Mobius function, 8 
MObius inversion, 10 
multiplicative function, 9 


P 

p-adic, 71 

partition of an integer, | 

Poisson approximation to the binomial 
distribution, 138 

Poisson random variable, 138 

prime number theorem, 67 
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probabilistic method, 107 
probability generating function, 54 
probability space, 133 


R 

Ramsey number, 105 

random variable, 133 

relative entropy, 115, 131 
restricted partition of an integer, | 


S 

sieve method, 19 

simple, symmetric random walk, 35 
square-free integer, 8 

Stirling numbers of the first kind, 54 


Stirling’s formula, 145 
survival, 117 


T 
tampering detection, 99 
total variation distance, 99 


Vv 
variance, 134 


Ww 
weak convergence, 139 
weak law of large numbers, 136, 137 
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